Summary of Llm Maybe Longlm: Self-extend Llm Context Window Without Tuning, by Hongye Jin et al.

LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning

by Hongye Jin, Xiaotian Han, Jingfeng Yang, Zhimeng Jiang, Zirui Liu, Chia-Yuan Chang, Huiyuan Chen, Xia Hu

First submitted to arxiv on: 2 Jan 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary This research proposes a novel approach to extend the context window of Large Language Models (LLMs) without fine-tuning. The method, called SelfExtend, constructs bi-level attention information to capture dependencies among tokens at different distances. The grouped attention focuses on distant tokens, while neighbor attention emphasizes adjacent tokens within a specified range. By leveraging the original model’s self-attention mechanism during inference, SelfExtend can effortlessly extend existing LLMs’ context window without additional training. Experimental results demonstrate the effectiveness of SelfExtend in extending the context window on multiple benchmarks.
Low	GrooveSquid.com (original content)	Low Difficulty Summary LLMs struggle to generalize well when dealing with long input sequences that exceed their training sequence length. This research presents a simple and effective way to overcome this challenge by using the LLM’s own capabilities. By adding bi-level attention information, the SelfExtend method captures both distant and adjacent token dependencies, allowing it to process longer contexts without needing to retrain the model. This approach is easy to implement, requiring only minor code modifications.

Keywords

* Artificial intelligence * Attention * Context window * Fine tuning * Inference * Self attention * Token

LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning

by Hongye Jin, Xiaotian Han, Jingfeng Yang, Zhimeng Jiang, Zirui Liu, Chia-Yuan Chang, Huiyuan Chen, Xia Hu

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Optimal Rates Of Kernel Ridge Regression Under Source Condition in Large Dimensions, by Haobo Zhang et al.

Summary of Incorporating Geo-diverse Knowledge Into Prompting For Increased Geographical Robustness in Object Recognition, by Kyle Buettner et al.

Related Posts