Summary of Towards Interpreting Language Models: a Case Study in Multi-hop Reasoning, by Mansi Sakarvadia
Towards Interpreting Language Models: A Case Study in Multi-Hop Reasoning
by Mansi Sakarvadia
First submitted to arxiv on: 6 Nov 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The proposed approach aims to improve the performance of language models (LMs) on multi-hop reasoning tasks by injecting targeted memories into attention heads. The study analyzes the activations of GPT-2 models in response to single- and multi-hop prompts, revealing that small subsets of attention heads significantly impact model predictions. To facilitate interpretation of these heads, an open-source tool called Attention Lens is developed, which translates attention head outputs into vocabulary tokens. Experimental results show that a simple memory injection can increase the probability of the desired next token in multi-hop tasks by up to 424%. The proposed approach has implications for enhancing the quality of multi-hop prompt completions and localizing sources of model failures. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary The paper tries to help language models do better at answering questions that require looking at multiple pieces of information. Right now, these models struggle with this task. The researchers propose a way to improve their performance by giving them extra information they can use to answer the question. They test this idea and find that it works really well – sometimes as much as 424% better! They also create a tool that helps people understand how the model is thinking when it answers a question, which can be useful for making sure the model isn’t being biased or saying something mean on purpose. |
Keywords
» Artificial intelligence » Attention » Gpt » Probability » Prompt » Token