Summary of Massive Activations in Large Language Models, by Mingjie Sun et al.

Massive Activations in Large Language Models

by Mingjie Sun, Xinlei Chen, J. Zico Kolter, Zhuang Liu

First submitted to arxiv on: 27 Feb 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary This paper investigates an intriguing phenomenon in Large Language Models (LLMs), where only a few activations display extremely large values, often exceeding 100,000 times larger than others. Dubbed “massive activations,” this empirical observation is demonstrated to be widespread across various LLMs and characterized by their locations. The study finds that these massive activations exhibit surprising stability, regardless of input variations, and function as essential bias terms in LLMs. Moreover, the presence of massive activations leads to the concentration of attention probabilities on specific tokens and the emergence of implicit bias terms in self-attention output. The research also explores similar phenomena in Vision Transformers. This work has implications for understanding the behavior and performance of these models.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This study looks at something cool about Large Language Models (like Google’s BERT). These models have some special “activations” that are way bigger than others – sometimes 100,000 times bigger! The researchers found out where these big activations show up and how they work. They also discovered that these massive activations help the model make decisions and understand language better. This is important because it can help us create even more powerful language models in the future.

Keywords

* Artificial intelligence * Attention * Bert * Self attention

Massive Activations in Large Language Models

by Mingjie Sun, Xinlei Chen, J. Zico Kolter, Zhuang Liu

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Securing Reliability: a Brief Overview on Enhancing In-context Learning For Foundation Models, by Yunpeng Huang et al.

Summary of Graph Neural Networks and Arithmetic Circuits, by Timon Barlag et al.

Related Posts