Summary of Lloco: Learning Long Contexts Offline, by Sijun Tan et al.

LLoCO: Learning Long Contexts Offline

by Sijun Tan, Xiuyu Li, Shishir Patil, Ziyang Wu, Tianjun Zhang, Kurt Keutzer, Joseph E. Gonzalez, Raluca Ada Popa

First submitted to arxiv on: 11 Apr 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary In this paper, researchers propose a novel approach called LLoCO to efficiently process long contexts for large language models (LLMs). The traditional self-attention mechanism and KV cache sizes can lead to significant computational and memory overhead during generation. LLoCO addresses this issue by learning contexts offline through context compression and in-domain parameter-efficient fine-tuning with LoRA. This approach enables an LLM to create a concise representation of the original context, efficiently retrieving relevant information to answer questions accurately. The proposed method extends the effective context window of a 4k token LLaMA2-7B model to handle up-to 128k tokens. Evaluation on several long-context question-answering datasets demonstrates that LLoCO significantly outperforms in-context learning while using 30 times fewer tokens during inference.
Low	GrooveSquid.com (original content)	Low Difficulty Summary Long contexts remain a challenge for large language models (LLMs). Researchers have proposed a solution called LLoCO, which helps LLMs efficiently process long contexts. This means they can answer questions more accurately and quickly. The new approach is better than others because it uses 30 times fewer tokens during inference.

Keywords

* Artificial intelligence * Context window * Fine tuning * Inference * Lora * Parameter efficient * Question answering * Self attention * Token

LLoCO: Learning Long Contexts Offline

by Sijun Tan, Xiuyu Li, Shishir Patil, Ziyang Wu, Tianjun Zhang, Kurt Keutzer, Joseph E. Gonzalez, Raluca Ada Popa

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Two Effects, One Trigger: on the Modality Gap, Object Bias, and Information Imbalance in Contrastive Vision-language Models, by Simon Schrodi et al.

Summary of Any2point: Empowering Any-modality Large Models For Efficient 3d Understanding, by Yiwen Tang et al.

Related Posts