Summary of Inattention: Linear Context Scaling For Transformers, by Joseph Eisner
InAttention: Linear Context Scaling for Transformersby Joseph EisnerFirst submitted to arxiv on: 9 Oct 2024CategoriesMain:…
InAttention: Linear Context Scaling for Transformersby Joseph EisnerFirst submitted to arxiv on: 9 Oct 2024CategoriesMain:…
Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs in RAGby Bowen Jin, Jinsung Yoon,…
FutureFill: Fast Generation from Convolutional Sequence Modelsby Naman Agarwal, Xinyi Chen, Evan Dogariu, Vlad Feinberg,…
On The Adaptation of Unlimiformer for Decoder-Only Transformersby Kian Ahrabian, Alon Benhaim, Barun Patra, Jay…
Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reductionby Zhenmei…
CritiPrefill: A Segment-wise Criticality-based Approach for Prefilling Acceleration in LLMsby Junlin Lv, Yuan Feng, Xike…
An Empirical Study on Context Length for Open-Domain Dialog Generationby Xinyi Shen, Zuoquan LinFirst submitted…
Multilingual Needle in a Haystack: Investigating Long-Context Behavior of Multilingual Large Language Modelsby Amey Hengle,…
LAMPO: Large Language Models as Preference Machines for Few-shot Ordinal Classificationby Zhen Qin, Junru Wu,…
On the Benefits of Rank in Attention Layersby Noah Amsel, Gilad Yehudai, Joan BrunaFirst submitted…