Loading Now

Summary of Decomposing the Dark Matter Of Sparse Autoencoders, by Joshua Engels et al.


Decomposing The Dark Matter of Sparse Autoencoders

by Joshua Engels, Logan Riggs, Max Tegmark

First submitted to arxiv on: 18 Oct 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: None

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
Medium Difficulty Summary: This paper delves into the mysterious phenomenon of “dark matter” in sparse autoencoders (SAEs), a technique used to decompose language model activations. Current SAEs are found to leave approximately half of their error vector unexplained, which is dubbed “dark matter”. The study reveals that about 90% of the norm of this dark matter can be linearly predicted from the initial activation vector. Moreover, it is shown that the scaling behavior of SAE error norms at a per-token level is predictable and depends on the size of the SAE model. Building upon the linear representation hypothesis, the authors propose new models of activations to explain these findings. The paper also investigates the “nonlinear” error component, which is found to contain fewer unlearned features and is responsible for a proportional increase in cross entropy loss when inserted into the model. Two methods are proposed to reduce nonlinear SAE error: inference time gradient pursuit and linear transformations from earlier layer SAE outputs.
Low GrooveSquid.com (original content) Low Difficulty Summary
Low Difficulty Summary: This research explores why language models sometimes struggle to understand certain things. It’s like there’s a mystery “dark matter” that they can’t explain. The study finds that most of this dark matter can be predicted by looking at the initial state of the model, and that larger models have trouble understanding the same things as smaller ones. The researchers think that some parts of the model are still learning, and that these new discoveries could help improve how language models work.

Keywords

» Artificial intelligence  » Cross entropy  » Inference  » Language model  » Token