Summary of Breaking Symmetry When Training Transformers, by Chunsheng Zuo et al.
Breaking Symmetry When Training Transformers
by Chunsheng Zuo, Michael Guerzhoy
First submitted to arxiv on: 6 Feb 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Computation and Language (cs.CL)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The authors investigate the relationship between Transformer architectures and their ability to model input sequences where order is important. They show that when positional encodings and causal attention are not used, the prediction of the next output token is invariant to permutations of previous tokens. This symmetry breaking is enabled by the causal connection mechanism, which encourages “slices” of the Transformer to represent the same location in the sequence. The authors hypothesize that residual connections contribute to this phenomenon and provide evidence for it. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Transformers are powerful language models that can process sequences of words. But did you know that they can do more than just recognize patterns? Researchers found that when Transformers don’t use certain mechanisms, their predictions become immune to the order in which input tokens appear. This is a big deal because many real-world tasks rely on sequence order being important. The authors think that something called residual connections makes this possible and prove it through their research. |
Keywords
* Artificial intelligence * Attention * Token * Transformer