Summary of Chain and Causal Attention For Efficient Entity Tracking, by Erwan Fagnou et al.
Chain and Causal Attention for Efficient Entity Trackingby Erwan Fagnou, Paul Caillon, Blaise Delattre, Alexandre…
Chain and Causal Attention for Efficient Entity Trackingby Erwan Fagnou, Paul Caillon, Blaise Delattre, Alexandre…
LevAttention: Time, Space, and Streaming Efficient Algorithm for Heavy Attentionsby Ravindran Kannan, Chiranjib Bhattacharyya, Praneeth…
Transformers learn variable-order Markov chains in-contextby Ruida Zhou, Chao Tian, Suhas DiggaviFirst submitted to arxiv…
Differential Transformerby Tianzhu Ye, Li Dong, Yuqing Xia, Yutao Sun, Yi Zhu, Gao Huang, Furu…
DEPT: Decoupled Embeddings for Pre-training Language Modelsby Alex Iacob, Lorenzo Sani, Meghdad Kurmanji, William F.…
TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attentionby Lijie Yang, Zhihao Zhang,…
Timer-XL: Long-Context Transformers for Unified Time Series Forecastingby Yong Liu, Guo Qin, Xiangdong Huang, Jianmin…
TimeCNN: Refining Cross-Variable Interaction on Time Point for Time Series Forecastingby Ao Hu, Dongkai Wang,…
Mastering Chinese Chess AI (Xiangqi) Without Searchby Yu Chen, Juntong Lin, Zhichao ShuFirst submitted to…
On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descentby Bingrui Li, Wei…