Summary of Round and Round We Go! What Makes Rotary Positional Encodings Useful?, by Federico Barbero et al.
Round and Round We Go! What makes Rotary Positional Encodings useful?by Federico Barbero, Alex Vitvitskyi,…
Round and Round We Go! What makes Rotary Positional Encodings useful?by Federico Barbero, Alex Vitvitskyi,…
Time Transfer: On Optimal Learning Rate and Batch Size In The Infinite Data Limitby Oleg…
Accelerating Diffusion Transformers with Token-wise Feature Cachingby Chang Zou, Xuyang Liu, Ting Liu, Siteng Huang,…
SFTMix: Elevating Language Model Instruction Tuning with Mixup Recipeby Yuxin Xiao, Shujian Zhang, Wenxuan Zhou,…
Differential Transformerby Tianzhu Ye, Li Dong, Yuqing Xia, Yutao Sun, Yi Zhu, Gao Huang, Furu…
PrefixQuant: Eliminating Outliers by Prefixed Tokens for Large Language Models Quantizationby Mengzhao Chen, Yi Liu,…
DEPT: Decoupled Embeddings for Pre-training Language Modelsby Alex Iacob, Lorenzo Sani, Meghdad Kurmanji, William F.…
TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attentionby Lijie Yang, Zhihao Zhang,…
SparsePO: Controlling Preference Alignment of LLMs via Sparse Token Masksby Fenia Christopoulou, Ronald Cardenas, Gerasimos…
Timer-XL: Long-Context Transformers for Unified Time Series Forecastingby Yong Liu, Guo Qin, Xiangdong Huang, Jianmin…