Loading Now

Summary of Dual Process Learning: Controlling Use Of In-context Vs. In-weights Strategies with Weight Forgetting, by Suraj Anand and Michael A. Lepori and Jack Merullo and Ellie Pavlick


Dual Process Learning: Controlling Use of In-Context vs. In-Weights Strategies with Weight Forgetting

by Suraj Anand, Michael A. Lepori, Jack Merullo, Ellie Pavlick

First submitted to arxiv on: 28 May 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
A novel study investigates the ability of language models to perform “structural in-context learning,” which involves adapting behavior based on sentence structure or task structure rather than memorized token embeddings. Researchers found that this capability emerges early during language model pretraining, but then quickly disappears. To address this, they introduced methods for modulating the preference for structural in-context learning and in-weights learning, enabling a “dual process strategy” where both approaches coexist within a single model. The study used synthetic and naturalistic tasks with toy models, masked language models, and autoregressive language models to evaluate the effectiveness of these methods.
Low GrooveSquid.com (original content) Low Difficulty Summary
Language models can learn new information based on context, but they struggle when faced with unfamiliar words. This study looks at how language models adapt their behavior based on sentence structure or task structure, rather than just memorizing word meanings. The researchers found that this ability appears early in training, but then disappears quickly. They also developed methods to help language models balance between adapting to new information and remembering what they’ve learned before.

Keywords

» Artificial intelligence  » Autoregressive  » Language model  » Pretraining  » Token