Summary of Lloco: Learning Long Contexts Offline, by Sijun Tan et al.
LLoCO: Learning Long Contexts Offline
by Sijun Tan, Xiuyu Li, Shishir Patil, Ziyang Wu, Tianjun Zhang, Kurt Keutzer, Joseph E. Gonzalez, Raluca Ada Popa
First submitted to arxiv on: 11 Apr 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary In this paper, researchers propose a novel approach called LLoCO to efficiently process long contexts for large language models (LLMs). The traditional self-attention mechanism and KV cache sizes can lead to significant computational and memory overhead during generation. LLoCO addresses this issue by learning contexts offline through context compression and in-domain parameter-efficient fine-tuning with LoRA. This approach enables an LLM to create a concise representation of the original context, efficiently retrieving relevant information to answer questions accurately. The proposed method extends the effective context window of a 4k token LLaMA2-7B model to handle up-to 128k tokens. Evaluation on several long-context question-answering datasets demonstrates that LLoCO significantly outperforms in-context learning while using 30 times fewer tokens during inference. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Long contexts remain a challenge for large language models (LLMs). Researchers have proposed a solution called LLoCO, which helps LLMs efficiently process long contexts. This means they can answer questions more accurately and quickly. The new approach is better than others because it uses 30 times fewer tokens during inference. |
Keywords
* Artificial intelligence * Context window * Fine tuning * Inference * Lora * Parameter efficient * Question answering * Self attention * Token