Summary of Jet Expansions Of Residual Computation, by Yihong Chen et al.

Jet Expansions of Residual Computation

by Yihong Chen, Xiangxiang Xu, Yao Lu, Pontus Stenetorp, Luca Franceschi

First submitted to arxiv on: 8 Oct 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The proposed framework expands residual computational graphs using jets, operators that generalize truncated Taylor series. This method allows for a systematic approach to disentangle contributions of different computational paths to model predictions, unlike existing techniques such as distillation, probing, or early decoding. The framework relies solely on the model itself and does not require any data, training, or sampling from the model. The proposed approach grounds and subsumes logit lens, reveals a (super-)exponential path structure in the recursive residual depth, and opens up several applications. These include sketching a transformer large language model with n-gram statistics extracted from its computations, and indexing the models’ levels of toxicity knowledge. Our approach enables data-free analysis of residual computation for model interpretability, development, and evaluation.
Low	GrooveSquid.com (original content)	Low Difficulty Summary A new way to understand how machines learn is presented in this paper. The authors create a system that can analyze how different parts of a machine learning model contribute to its predictions. This system works without needing any extra data or training, just by using the information already inside the model. The researchers show that their approach can be used to help develop and evaluate models, as well as understand how they make decisions.

Keywords

* Artificial intelligence * Distillation * Large language model * Machine learning * N gram * Transformer

Jet Expansions of Residual Computation

by Yihong Chen, Xiangxiang Xu, Yao Lu, Pontus Stenetorp, Luca Franceschi

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Sparse Repellency For Shielded Generation in Text-to-image Diffusion Models, by Michael Kirchhof et al.

Summary of Qera: An Analytical Framework For Quantization Error Reconstruction, by Cheng Zhang et al.

Related Posts