Loading Now

Summary of Llm Maybe Longlm: Self-extend Llm Context Window Without Tuning, by Hongye Jin et al.


LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning

by Hongye Jin, Xiaotian Han, Jingfeng Yang, Zhimeng Jiang, Zirui Liu, Chia-Yuan Chang, Huiyuan Chen, Xia Hu

First submitted to arxiv on: 2 Jan 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This research proposes a novel approach to extend the context window of Large Language Models (LLMs) without fine-tuning. The method, called SelfExtend, constructs bi-level attention information to capture dependencies among tokens at different distances. The grouped attention focuses on distant tokens, while neighbor attention emphasizes adjacent tokens within a specified range. By leveraging the original model’s self-attention mechanism during inference, SelfExtend can effortlessly extend existing LLMs’ context window without additional training. Experimental results demonstrate the effectiveness of SelfExtend in extending the context window on multiple benchmarks.
Low GrooveSquid.com (original content) Low Difficulty Summary
LLMs struggle to generalize well when dealing with long input sequences that exceed their training sequence length. This research presents a simple and effective way to overcome this challenge by using the LLM’s own capabilities. By adding bi-level attention information, the SelfExtend method captures both distant and adjacent token dependencies, allowing it to process longer contexts without needing to retrain the model. This approach is easy to implement, requiring only minor code modifications.

Keywords

* Artificial intelligence  * Attention  * Context window  * Fine tuning  * Inference  * Self attention  * Token