Summary of Talking Heads: Understanding Inter-layer Communication in Transformer Language Models, by Jack Merullo et al.
Talking Heads: Understanding Inter-layer Communication in Transformer Language Models
by Jack Merullo, Carsten Eickhoff, Ellie Pavlick
First submitted to arxiv on: 13 Jun 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The paper investigates how transformer language models (LMs) transmit information between layers. It identifies a mechanism used in two LMs to selectively inhibit items in context, which underlies a common abstraction across many context-retrieval behaviors. The researchers find that models write into low-rank subspaces of the residual stream, representing features that are then read out by later layers, forming low-rank communication channels between layers. They also show that this mechanism can explain an otherwise arbitrary-seeming sensitivity to item order in prompts. By decomposing attention heads with Singular Value Decomposition (SVD), they predict interactions between heads separated by one or more layers based on their weight matrices alone. The study demonstrates how manipulating internal model representations and editing model weights can significantly improve performance on a synthetic task, improving task accuracy by over 20%. The analysis reveals an intricate interpretable structure learned from language model pretraining, helping to understand why sophisticated LMs sometimes fail in simple domains. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary The paper looks at how language models work inside. It finds a way that some models hide certain information from other parts of the model. This helps explain why some models are better than others at doing tasks like remembering lists. The researchers also show that they can make the models do better by changing what’s going on inside the model. They think this will help us understand why language models sometimes don’t do well, even though they’re very good at lots of things. |
Keywords
» Artificial intelligence » Attention » Language model » Pretraining » Transformer