Summary of Self-attention As An Attractor Network: Transient Memories Without Backpropagation, by Francesco D’amico et al.
Self-attention as an attractor network: transient memories without backpropagation
by Francesco D’Amico, Matteo Negri
First submitted to arxiv on: 24 Sep 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Disordered Systems and Neural Networks (cond-mat.dis-nn)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The abstract presents a novel framework for interpreting self-attention in transformers as an attractor network, leveraging analogies with pseudo-likelihood. The authors show that self-attention can be viewed as the derivative of local energy terms, similar to pseudo-likelihood. This framework enables the design of recurrent models that can be trained without backpropagation, exhibiting transient states correlated with both training and testing examples. The work offers a new theoretical approach inspired by physics to understand transformers. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Transformers are powerful tools in AI, but have you ever wondered how they work? A team of researchers has found a way to explain one part of transformers, called self-attention, using ideas from physics. They showed that self-attention can be seen as the derivative of some energy terms, which is similar to another concept called pseudo-likelihood. This new understanding lets them design special models that don’t need backpropagation, and they found that these models behave in interesting ways. |
Keywords
» Artificial intelligence » Backpropagation » Likelihood » Self attention