Loading Now

Summary of Just Read Twice: Closing the Recall Gap For Recurrent Language Models, by Simran Arora et al.


Just read twice: closing the recall gap for recurrent language models

by Simran Arora, Aman Timalsina, Aaryan Singhal, Benjamin Spector, Sabri Eyuboglu, Xinyi Zhao, Ashish Rao, Atri Rudra, Christopher Ré

First submitted to arxiv on: 7 Jul 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The rapid emergence of recurrent large language models that rival Transformers in perplexity has brought excitement and challenges to the field. These architectures use a constant amount of memory during inference, but their limited capacity hinders their ability to recall and utilize information in long contexts, leading to brittle in-context learning quality. To address this challenge, researchers have been exploring ways to select which information to store and discard. This paper observes that the order in which information is presented to the model impacts its selection difficulty, formalizing this phenomenon by relating it to a problem called set disjointness (SD). The study shows empirically and theoretically that the recurrent memory required to solve SD changes depending on the order of inputted sets, suggesting that processing prompts non-causally or repeating context in the prompt can help mitigate reliance on data order. To achieve this, the authors propose two models: JRT-Prompt, which repeats context multiple times in the prompt, and JRT-RNN, which uses non-causal prefix-linear-attention to process prompts.
Low GrooveSquid.com (original content) Low Difficulty Summary
Recurrent large language models are getting better at understanding language! They’re really good at learning from short texts, but they struggle when dealing with longer texts. This is because they don’t have enough memory to remember everything. Researchers want to figure out how to help these models learn more efficiently. They found that the order in which information is presented affects how well the model learns. To solve this problem, they came up with two new ways to process text: repeating context multiple times and using a special attention mechanism.

Keywords

» Artificial intelligence  » Attention  » Inference  » Perplexity  » Prompt  » Recall  » Rnn