Loading Now

Summary of Self-attention As An Attractor Network: Transient Memories Without Backpropagation, by Francesco D’amico et al.


Self-attention as an attractor network: transient memories without backpropagation

by Francesco D’Amico, Matteo Negri

First submitted to arxiv on: 24 Sep 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Disordered Systems and Neural Networks (cond-mat.dis-nn)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The abstract presents a novel framework for interpreting self-attention in transformers as an attractor network, leveraging analogies with pseudo-likelihood. The authors show that self-attention can be viewed as the derivative of local energy terms, similar to pseudo-likelihood. This framework enables the design of recurrent models that can be trained without backpropagation, exhibiting transient states correlated with both training and testing examples. The work offers a new theoretical approach inspired by physics to understand transformers.
Low GrooveSquid.com (original content) Low Difficulty Summary
Transformers are powerful tools in AI, but have you ever wondered how they work? A team of researchers has found a way to explain one part of transformers, called self-attention, using ideas from physics. They showed that self-attention can be seen as the derivative of some energy terms, which is similar to another concept called pseudo-likelihood. This new understanding lets them design special models that don’t need backpropagation, and they found that these models behave in interesting ways.

Keywords

» Artificial intelligence  » Backpropagation  » Likelihood  » Self attention