Summary of On the Adaptation Of Unlimiformer For Decoder-only Transformers, by Kian Ahrabian et al.
On The Adaptation of Unlimiformer for Decoder-Only Transformers
by Kian Ahrabian, Alon Benhaim, Barun Patra, Jay Pujara, Saksham Singhal, Xia Song
First submitted to arxiv on: 2 Oct 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The proposed work addresses the limitation of current large language models’ context lengths, particularly focusing on adapting the vector-retrieval augmentation method Unlimiformer to decoder-only transformers. By introducing a series of modifications, the authors overcome this limitation and demonstrate improved performance in summarization tasks, achieving comparable results to models with twice the context length. The study also expands the original experimental setup to include free-form Q&A and instruction-tuned models. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Large language models have a limited context length, which can be limiting. Researchers have tried to fix this by increasing the context length, but most models still only go up to 4k or less. A new method called Unlimiformer helps with this problem, but it only works with certain types of transformers. The authors of this paper try to make Unlimiformer work with decoder-only transformers and find ways to make it better. They also add a new task called free-form Q&A and use a special model to test their ideas. |
Keywords
» Artificial intelligence » Context length » Decoder » Summarization