Summary of Mini-sequence Transformer: Optimizing Intermediate Memory For Long Sequences Training, by Cheng Luo et al.
Mini-Sequence Transformer: Optimizing Intermediate Memory for Long Sequences Trainingby Cheng Luo, Jiawei Zhao, Zhuoming Chen,…
Mini-Sequence Transformer: Optimizing Intermediate Memory for Long Sequences Trainingby Cheng Luo, Jiawei Zhao, Zhuoming Chen,…
When Can Transformers Count to n?by Gilad Yehudai, Haim Kaplan, Asma Ghandeharioun, Mor Geva, Amir…
Characterizing Prompt Compression Methods for Long Context Inferenceby Siddharth Jha, Lutfi Eren Erdogan, Sehoon Kim,…
Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learningby Brandon Huang, Chancharik Mitra, Assaf Arbelle, Leonid…
MoA: Mixture of Sparse Attention for Automatic Large Language Model Compressionby Tianyu Fu, Haofeng Huang,…
A Training-free Sub-quadratic Cost Transformer Model Serving Framework With Hierarchically Pruned Attentionby Heejun Lee, Geon…
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modelingby Liliang Ren, Yang…
A Study of Optimizations for Fine-tuning Large Language Modelsby Arjun Singh, Nikhil Pandey, Anup Shirgaonkar,…
Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculumby Hadi Pouransari, Chun-Liang Li, Jen-Hao…
Asymptotic theory of in-context learning by linear attentionby Yue M. Lu, Mary I. Letey, Jacob…