Loading Now

Summary of On the Adaptation Of Unlimiformer For Decoder-only Transformers, by Kian Ahrabian et al.


On The Adaptation of Unlimiformer for Decoder-Only Transformers

by Kian Ahrabian, Alon Benhaim, Barun Patra, Jay Pujara, Saksham Singhal, Xia Song

First submitted to arxiv on: 2 Oct 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The proposed work addresses the limitation of current large language models’ context lengths, particularly focusing on adapting the vector-retrieval augmentation method Unlimiformer to decoder-only transformers. By introducing a series of modifications, the authors overcome this limitation and demonstrate improved performance in summarization tasks, achieving comparable results to models with twice the context length. The study also expands the original experimental setup to include free-form Q&A and instruction-tuned models.
Low GrooveSquid.com (original content) Low Difficulty Summary
Large language models have a limited context length, which can be limiting. Researchers have tried to fix this by increasing the context length, but most models still only go up to 4k or less. A new method called Unlimiformer helps with this problem, but it only works with certain types of transformers. The authors of this paper try to make Unlimiformer work with decoder-only transformers and find ways to make it better. They also add a new task called free-form Q&A and use a special model to test their ideas.

Keywords

» Artificial intelligence  » Context length  » Decoder  » Summarization