Summary of Parallel Structures in Pre-training Data Yield In-context Learning, by Yanda Chen et al.
Parallel Structures in Pre-training Data Yield In-Context Learning
by Yanda Chen, Chen Zhao, Zhou Yu, Kathleen McKeown, He He
First submitted to arxiv on: 19 Feb 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary Pre-trained language models (LMs) exhibit in-context learning (ICL), adapting to tasks with minimal examples. However, the origins of this capability remain unclear due to a significant distribution shift between pre-training text and ICL prompts. This study investigates what patterns in the pre-training data contribute to ICL. The authors find that LMs’ ICL ability relies on parallel structures in the pre-training data, pairs of phrases following similar templates within context windows. They detect these structures by analyzing whether training on one phrase improves prediction of the other and conduct ablation experiments to explore their effect on ICL. The results show that removing parallel structures reduces LMs’ ICL accuracy by 51%, highlighting the importance of these patterns in the pre-training data. This study sheds light on the role of parallel structures in facilitating LMs’ in-context learning capabilities. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Imagine you have a super smart computer program called a language model (LM). It can learn new things just by looking at examples, without needing to be told what to do. But how does it do that? Researchers wanted to know the secret behind this ability and found out that it’s because of patterns in the data the LM was trained on. These patterns are like pairs of sentences that follow similar rules. If you take away these patterns, the computer program becomes much worse at learning new things from examples. This discovery helps us understand how language models work and can even help make them better. |
Keywords
* Artificial intelligence * Language model