Summary of Inattention: Linear Context Scaling For Transformers, by Joseph Eisner
InAttention: Linear Context Scaling for Transformersby Joseph EisnerFirst submitted to arxiv on: 9 Oct 2024CategoriesMain:…
InAttention: Linear Context Scaling for Transformersby Joseph EisnerFirst submitted to arxiv on: 9 Oct 2024CategoriesMain:…
Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs in RAGby Bowen Jin, Jinsung Yoon,…
FutureFill: Fast Generation from Convolutional Sequence Modelsby Naman Agarwal, Xinyi Chen, Evan Dogariu, Vlad Feinberg,…
On The Adaptation of Unlimiformer for Decoder-Only Transformersby Kian Ahrabian, Alon Benhaim, Barun Patra, Jay…
Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reductionby Zhenmei…
CritiPrefill: A Segment-wise Criticality-based Approach for Prefilling Acceleration in LLMsby Junlin Lv, Yuan Feng, Xike…
An Empirical Study on Context Length for Open-Domain Dialog Generationby Xinyi Shen, Zuoquan LinFirst submitted…