Summary of Length-induced Embedding Collapse in Transformer-based Models, by Yuqi Zhou et al.

Length-Induced Embedding Collapse in Transformer-based Models

by Yuqi Zhou, Sunhao Dai, Zhanshuo Cao, Xiao Zhang, Jun Xu

First submitted to arxiv on: 31 Oct 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The paper investigates the phenomenon of Length Collapse in text embeddings, where longer texts lead to a performance degradation due to the collapse of token signals into a narrow space. Theoretically, it is proven that long sequences increase the attenuation rate of the low-pass filter effect of self-attention mechanisms, causing excessive filtering and feature map collapse. To mitigate this limitation, the TempScale method is proposed, which introduces a temperature in softmax() to achieve higher low-filter attenuation rates. Empirically, TempScale improves existing embedding models by up to 0.53% on 40 datasets from MTEB and 0.82% on 4 datasets from LongEmbed.
Low	GrooveSquid.com (original content)	Low Difficulty Summary The paper finds that text embeddings get worse at longer texts because they collapse into a small space. This happens when the self-attention mechanism treats long sequences like low-pass filters, making features repeat themselves and lose their information. To fix this, the TempScale method is introduced to control how much low-filtering happens, allowing it to work better on longer texts.

Keywords

» Artificial intelligence » Embedding » Feature map » Self attention » Softmax » Temperature » Token

Length-Induced Embedding Collapse in Transformer-based Models

by Yuqi Zhou, Sunhao Dai, Zhanshuo Cao, Xiao Zhang, Jun Xu

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Constraint Back-translation Improves Complex Instruction Following Of Large Language Models, by Yunjia Qi et al.

Summary of Schema Augmentation For Zero-shot Domain Adaptation in Dialogue State Tracking, by Christopher Richardson et al.

Related Posts