Summary of Efficient Automated Circuit Discovery in Transformers Using Contextual Decomposition, by Aliyah R. Hsu et al.
Efficient Automated Circuit Discovery in Transformers using Contextual Decomposition
by Aliyah R. Hsu, Georgia Zhou, Yeshwanth Cherapanamjeri, Yaxuan Huang, Anobel Y. Odisho, Peter R. Carroll, Bin Yu
First submitted to arxiv on: 1 Jul 2024
Categories
- Main: Artificial Intelligence (cs.AI)
- Secondary: Computation and Language (cs.CL); Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary Automated mechanistic interpretation research has garnered significant interest due to its potential to scale explanations of neural network internals to large models. This work introduces contextual decomposition for transformers (CD-T) to build interpretable circuits in large language models. CD-T enables the production of circuits with arbitrary levels of abstraction, including attention heads at specific sequence positions efficiently. The method involves mathematical equations to isolate feature contributions, followed by recursive pruning to reduce runtime from hours to seconds compared to state-of-the-art baselines. On three standard circuit evaluation datasets (indirect object identification, greater-than comparisons, and docstring completion), CD-T outperforms ACDC and EAP with an average of 97% ROC AUC under low runtimes. Additionally, the results demonstrate faithfulness of CD-T circuits, showing they are not due to random chance. Finally, CD-T circuits perfectly replicate original models’ behavior (faithfulness = 1) using fewer nodes than baselines for all tasks. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper is about making it easier to understand how big language models work. Right now, it’s hard to explain why these models make certain decisions. The authors create a new way to break down the model into smaller parts that show what each part does. This helps us understand how the model works better and faster than before. They test their method on three different tasks and find that it is very accurate (97%) and can even replicate the original model’s behavior using fewer parts. |
Keywords
» Artificial intelligence » Attention » Auc » Neural network » Pruning