Summary of Not All Language Model Features Are One-dimensionally Linear, by Joshua Engels et al.

Not All Language Model Features Are One-Dimensionally Linear

by Joshua Engels, Eric J. Michaud, Isaac Liao, Wes Gurnee, Max Tegmark

First submitted to arxiv on: 23 May 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary Recent research suggests that language models manipulate one-dimensional concept representations (“features”) in activation space for computation. This paper explores the possibility of inherently multi-dimensional representations. We develop a rigorous definition of irreducible multi-dimensional features based on whether they can be decomposed into independent or non-co-occurring lower-dimensional features. A scalable method using sparse autoencoders is designed to find these features in GPT-2 and Mistral 7B, yielding strikingly interpretable examples like circular representations of days of the week and months of the year. We identify tasks where these exact circles solve computational problems involving modular arithmetic. Intervention experiments on Mistral 7B and Llama 3 8B demonstrate that these circular features are the fundamental unit of computation in these tasks, with continuity observed in Mistral 7B’s days of the week feature.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This paper looks at how language models work. Some researchers think they just use one-dimensional concepts to compute things. But this study suggests that maybe some models can also use multi-dimensional representations. The team develops a way to define what these features are and finds them in certain models, like GPT-2 and Mistral 7B. These features include circles representing days of the week and months of the year, which help solve specific problems. The study shows that these circular features are important for computation and remain consistent in certain models.

Keywords

» Artificial intelligence » Gpt » Llama

Not All Language Model Features Are One-Dimensionally Linear

by Joshua Engels, Eric J. Michaud, Isaac Liao, Wes Gurnee, Max Tegmark

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Optimal Rates For Vector-valued Spectral Regularization Learning Algorithms, by Dimitri Meunier et al.

Summary of Revisiting Day-ahead Electricity Price: Simple Model Save Millions, by Linian Wang et al.

Related Posts