Summary of Longembed: Extending Embedding Models For Long Context Retrieval, by Dawei Zhu et al.
LongEmbed: Extending Embedding Models for Long Context Retrieval
by Dawei Zhu, Liang Wang, Nan Yang, Yifan Song, Wenhao Wu, Furu Wei, Sujian Li
First submitted to arxiv on: 18 Apr 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The paper explores ways to extend the context window of existing embedding models in NLP applications such as IR and RAG. Current models are limited to a narrow context window not exceeding 8k tokens, hindering their application in scenarios requiring long inputs like legal contracts. The authors examine the performance of current models on a newly constructed LongEmbed benchmark, which features documents of varying length and dispersed target information. They find huge room for improvement and propose training-free context window extension strategies like position interpolation to effectively extend the context window by several folds. The paper also shows that fine-tuning models with absolute position encoding (APE) can lead to notable performance gains while preserving original behavior for short inputs, whereas rotary position embedding (RoPE)-based methods like NTK and SelfExtend demonstrate RoPE’s superiority. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary The paper helps us understand how to make computer programs better at understanding long documents. Right now, these programs are only good at looking at small parts of text. The authors created a special test to see how well they do on longer texts and found that they need to get better. They came up with ways to improve the programs without needing them to learn again. This will help us use these programs for important tasks like reading long contracts. |
Keywords
» Artificial intelligence » Context window » Embedding » Fine tuning » Nlp » Rag