Loading Now

Summary of Long Context Compression with Activation Beacon, by Peitian Zhang et al.


Long Context Compression with Activation Beacon

by Peitian Zhang, Zheng Liu, Shitao Xiao, Ninglu Shao, Qiwei Ye, Zhicheng Dou

First submitted to arxiv on: 7 Jan 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper presents a novel approach to compressing long contexts in transformer-based large language models (LLMs), a crucial challenge in reducing computational and memory costs. The proposed Activation Beacon module targets efficient, flexible, and effective compression by directly compressing activations at every layer, rather than relying on soft prompts. A tailored compression workflow is designed to enable high-quality compression during training and inference. The model is trained through compression-based auto-regression using plain texts and instructional data to optimize performance. To evaluate the approach, various long-context tasks are conducted, including document understanding, few-shot learning, and Needle-in-a-Haystack. Despite existing methods struggling with these challenging tasks, Activation Beacon achieves comparable performance to the uncompressed baseline while accelerating inference time by 2x and reducing memory costs for KV cache by 8x.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper makes it possible for large language models to work more efficiently by compressing long pieces of text. This is important because current language models can be very slow and use a lot of computer resources. The new method, called Activation Beacon, directly compresses the information that the model uses at every layer, rather than using a slower approach. The paper also shows how to train the model to work well with different levels of compression. To test this approach, researchers tried it on several tasks that require processing long pieces of text, such as understanding documents and learning new concepts quickly. The results show that Activation Beacon is able to achieve similar performance to the original method while using less computer resources.

Keywords

» Artificial intelligence  » Few shot  » Inference  » Regression  » Transformer