Summary of Learning to Grok: Emergence Of In-context Learning and Skill Composition in Modular Arithmetic Tasks, by Tianyu He et al.

Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks

by Tianyu He, Darshil Doshi, Aritra Das, Andrey Gromov

First submitted to arxiv on: 4 Jun 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary This paper explores the emergence of in-context learning and skill composition in modular arithmetic tasks. A GPT-style transformer is pre-trained on a subset of these tasks and tested on out-of-distribution ones. The results show that as more pre-training tasks are added, the model transitions from in-distribution to out-of-distribution generalization. Specifically, the smallest model capable of out-of-distribution generalization requires two transformer blocks, while deeper models require early stopping due to a transient phase of out-of-distribution generalization. An interpretability study reveals highly structured representations in attention heads and MLPs, with an algorithmic shift observed as the number of in-context examples increases.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This paper studies how large language models learn new skills and apply them to different problems they haven’t seen before. The researchers use a special type of math problem called modular arithmetic to test the model’s ability to generalize. They find that as the model is trained on more of these problems, it becomes better at solving new, unseen problems. However, they also discover that this process has an upper limit and that deeper models need to be stopped early before they start doing poorly. By looking inside the model’s representations, they see that it’s building a complex understanding of the math problems.

Keywords

» Artificial intelligence » Attention » Early stopping » Generalization » Gpt » Transformer

Learning to grok: Emergence of in-context learning and skill composition in modular arithmetic tasks

by Tianyu He, Darshil Doshi, Aritra Das, Andrey Gromov

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Landscape-aware Growing: the Power Of a Little Lag, by Stefani Karp et al.

Summary of Exploring the Potential Of Polynomial Basis Functions in Kolmogorov-arnold Networks: a Comparative Study Of Different Groups Of Polynomials, by Seyd Teymoor Seydi

Related Posts