Summary of Efficient Automated Circuit Discovery in Transformers Using Contextual Decomposition, by Aliyah R. Hsu et al.

Efficient Automated Circuit Discovery in Transformers using Contextual Decomposition

by Aliyah R. Hsu, Georgia Zhou, Yeshwanth Cherapanamjeri, Yaxuan Huang, Anobel Y. Odisho, Peter R. Carroll, Bin Yu

First submitted to arxiv on: 1 Jul 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary Automated mechanistic interpretation research has garnered significant interest due to its potential to scale explanations of neural network internals to large models. This work introduces contextual decomposition for transformers (CD-T) to build interpretable circuits in large language models. CD-T enables the production of circuits with arbitrary levels of abstraction, including attention heads at specific sequence positions efficiently. The method involves mathematical equations to isolate feature contributions, followed by recursive pruning to reduce runtime from hours to seconds compared to state-of-the-art baselines. On three standard circuit evaluation datasets (indirect object identification, greater-than comparisons, and docstring completion), CD-T outperforms ACDC and EAP with an average of 97% ROC AUC under low runtimes. Additionally, the results demonstrate faithfulness of CD-T circuits, showing they are not due to random chance. Finally, CD-T circuits perfectly replicate original models’ behavior (faithfulness = 1) using fewer nodes than baselines for all tasks.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This paper is about making it easier to understand how big language models work. Right now, it’s hard to explain why these models make certain decisions. The authors create a new way to break down the model into smaller parts that show what each part does. This helps us understand how the model works better and faster than before. They test their method on three different tasks and find that it is very accurate (97%) and can even replicate the original model’s behavior using fewer parts.

Keywords

» Artificial intelligence » Attention » Auc » Neural network » Pruning

Efficient Automated Circuit Discovery in Transformers using Contextual Decomposition

by Aliyah R. Hsu, Georgia Zhou, Yeshwanth Cherapanamjeri, Yaxuan Huang, Anobel Y. Odisho, Peter R. Carroll, Bin Yu

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Locate&edit: Energy-based Text Editing For Efficient, Flexible, and Faithful Controlled Text Generation, by Hye Ryung Son and Jay-yoon Lee

Summary of Learnability Of Parameter-bounded Bayes Nets, by Arnab Bhattacharyya et al.

Related Posts