Summary of Length-induced Embedding Collapse in Transformer-based Models, by Yuqi Zhou et al.
Length-Induced Embedding Collapse in Transformer-based Models
by Yuqi Zhou, Sunhao Dai, Zhanshuo Cao, Xiao Zhang, Jun Xu
First submitted to arxiv on: 31 Oct 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The paper investigates the phenomenon of Length Collapse in text embeddings, where longer texts lead to a performance degradation due to the collapse of token signals into a narrow space. Theoretically, it is proven that long sequences increase the attenuation rate of the low-pass filter effect of self-attention mechanisms, causing excessive filtering and feature map collapse. To mitigate this limitation, the TempScale method is proposed, which introduces a temperature in softmax() to achieve higher low-filter attenuation rates. Empirically, TempScale improves existing embedding models by up to 0.53% on 40 datasets from MTEB and 0.82% on 4 datasets from LongEmbed. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary The paper finds that text embeddings get worse at longer texts because they collapse into a small space. This happens when the self-attention mechanism treats long sequences like low-pass filters, making features repeat themselves and lose their information. To fix this, the TempScale method is introduced to control how much low-filtering happens, allowing it to work better on longer texts. |
Keywords
» Artificial intelligence » Embedding » Feature map » Self attention » Softmax » Temperature » Token