Summary of Dual Process Learning: Controlling Use Of In-context Vs. In-weights Strategies with Weight Forgetting, by Suraj Anand and Michael A. Lepori and Jack Merullo and Ellie Pavlick

Dual Process Learning: Controlling Use of In-Context vs. In-Weights Strategies with Weight Forgetting

by Suraj Anand, Michael A. Lepori, Jack Merullo, Ellie Pavlick

First submitted to arxiv on: 28 May 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary A novel study investigates the ability of language models to perform “structural in-context learning,” which involves adapting behavior based on sentence structure or task structure rather than memorized token embeddings. Researchers found that this capability emerges early during language model pretraining, but then quickly disappears. To address this, they introduced methods for modulating the preference for structural in-context learning and in-weights learning, enabling a “dual process strategy” where both approaches coexist within a single model. The study used synthetic and naturalistic tasks with toy models, masked language models, and autoregressive language models to evaluate the effectiveness of these methods.
Low	GrooveSquid.com (original content)	Low Difficulty Summary Language models can learn new information based on context, but they struggle when faced with unfamiliar words. This study looks at how language models adapt their behavior based on sentence structure or task structure, rather than just memorizing word meanings. The researchers found that this ability appears early in training, but then disappears quickly. They also developed methods to help language models balance between adapting to new information and remembering what they’ve learned before.

Keywords

» Artificial intelligence » Autoregressive » Language model » Pretraining » Token

Dual Process Learning: Controlling Use of In-Context vs. In-Weights Strategies with Weight Forgetting

by Suraj Anand, Michael A. Lepori, Jack Merullo, Ellie Pavlick

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Recurrent Neural Networks: Vanishing and Exploding Gradients Are Not the End Of the Story, by Nicolas Zucchet et al.

Summary of An Efficient Multi Quantile Regression Network with Ad Hoc Prevention Of Quantile Crossing, by Jens Decke et al.

Related Posts