Loading Now

Summary of Selective Attention Improves Transformer, by Yaniv Leviathan et al.


Selective Attention Improves Transformer

by Yaniv Leviathan, Matan Kalman, Yossi Matias

First submitted to arxiv on: 3 Oct 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The paper introduces Selective Attention, a modification to the standard attention mechanism that reduces attention to unneeded elements. This simple parameter-free change improves language modeling performance across various model sizes and context lengths. For instance, transformers trained on C4 with selective attention perform similarly to those with more heads and parameters in their attention modules. Additionally, selective attention allows decreasing the size of the attention’s context buffer, resulting in meaningful reductions in memory and compute requirements during inference.
Low GrooveSquid.com (original content) Low Difficulty Summary
The paper makes a simple change to the standard attention mechanism that helps language models work better. This change is called Selective Attention, and it reduces how much the model focuses on things it doesn’t need. As a result, language modeling performance improves, regardless of the size or type of model used. For example, some transformers were trained on C4 with this new attention method and performed just as well as others that used more resources.

Keywords

» Artificial intelligence  » Attention  » Inference