Summary of Parallelizing Linear Transformers with the Delta Rule Over Sequence Length, by Songlin Yang et al.
Parallelizing Linear Transformers with the Delta Rule over Sequence Lengthby Songlin Yang, Bailin Wang, Yu…
Parallelizing Linear Transformers with the Delta Rule over Sequence Lengthby Songlin Yang, Bailin Wang, Yu…
Continuum Attention for Neural Operatorsby Edoardo Calvello, Nikola B. Kovachki, Matthew E. Levine, Andrew M.…
Learning Physical Simulation with Message Passing Transformerby Zeyi Xu, Yifei LiFirst submitted to arxiv on:…
Attention as a Hypernetworkby Simon Schug, Seijin Kobayashi, Yassir Akram, João Sacramento, Razvan PascanuFirst submitted…
Decision Mamba: A Multi-Grained State Space Model with Self-Evolution Regularization for Offline RLby Qi Lv,…
G-Transformer: Counterfactual Outcome Prediction under Dynamic and Time-varying Treatment Regimesby Hong Xiong, Feng Wu, Leon…
Automata Extraction from Transformersby Yihao Zhang, Zeming Wei, Meng SunFirst submitted to arxiv on: 8…
Aligned at the Start: Conceptual Groupings in LLM Embeddingsby Mehrdad Khatir, Sanchit Kabra, Chandan K.…
Transformer Conformal Prediction for Time Seriesby Junghwan Lee, Chen Xu, Yao XieFirst submitted to arxiv…
Retrieval & Fine-Tuning for In-Context Tabular Modelsby Valentin Thomas, Junwei Ma, Rasa Hosseinzadeh, Keyvan Golestan,…