Summary of Extractive Structures Learned in Pretraining Enable Generalization on Finetuned Facts, by Jiahai Feng et al.
Extractive Structures Learned in Pretraining Enable Generalization on Finetuned Facts
by Jiahai Feng, Stuart Russell, Jacob Steinhardt
First submitted to arxiv on: 5 Dec 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Computation and Language (cs.CL)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary Pretrained language models (LMs) can generalize facts beyond their finetuned scope by leveraging the implications learned during pretraining. For instance, if an LM is trained on “John Doe lives in Tokyo,” it can correctly answer “What language do people in John Doe’s city speak?” with “Japanese.” Researchers have long sought to understand the mechanisms behind this generalization and how LMs learn these connections during pretraining. This study introduces extractive structures as a framework for describing how components in LMs (e.g., MLPs or attention heads) coordinate to enable this generalization. The authors hypothesize that extractive structures are learned when encountering implications of previously known facts, leading to data ordering and weight grafting effects. Empirical results demonstrate these phenomena in various models, including OLMo-7b, Llama 3-8b, Gemma 2-9b, and Qwen 2-7b. This research sheds light on the learning mechanisms of LMs during pretraining and highlights the importance of fact learning at both early and late layers. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Did you know that special computer models called language models can learn to make connections between facts? For example, if they’re trained on a statement like “John Doe lives in Tokyo,” they can answer questions like “What language do people speak in John Doe’s city?” with the correct answer: Japanese! But how do these models figure out these connections? This study helps us understand how they learn to make these connections. The researchers found that certain patterns or structures help the models make these connections. They also discovered that the models can learn to connect facts at different levels, which is important for making new predictions. |
Keywords
» Artificial intelligence » Attention » Generalization » Llama » Pretraining