Loading Now

Summary of Attention As An Rnn, by Leo Feng et al.


Attention as an RNN

by Leo Feng, Frederick Tung, Hossein Hajimirsadeghi, Mohamed Osama Ahmed, Yoshua Bengio, Greg Mori

First submitted to arxiv on: 22 May 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: None

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The advent of Transformers revolutionized sequence modeling, offering a high-performing architecture that leverages GPU parallelism. However, this comes at the cost of computational expense at inference time, limiting its applications in low-resource settings such as mobile and embedded devices. To address this challenge, researchers proposed a novel approach by viewing attention as a special Recurrent Neural Network (RNN) with efficient computation capabilities. This led to the development of attention-based models like Transformers being viewed as RNN variants. However, unlike traditional RNNs, these models cannot be updated efficiently with new tokens, an essential property in sequence modeling. To overcome this limitation, researchers introduced a new efficient method for computing attention’s many-to-many RNN output based on the parallel prefix scan algorithm. Building on this new formulation, they proposed Aaren, an attention-based module that can be trained in parallel like Transformers while also being updated efficiently with new tokens, requiring only constant memory for inferences like traditional RNNs. Empirically, Aarens achieved comparable performance to Transformers on 38 datasets across four popular sequential problem settings: reinforcement learning, event forecasting, time series classification, and time series forecasting tasks, while being more time and memory-efficient.
Low GrooveSquid.com (original content) Low Difficulty Summary
Transformers revolutionized sequence modeling, but they’re too computationally expensive for low-resource devices. Researchers found a way to make attention-based models like Transformers more efficient by viewing attention as a special kind of neural network. This led to the development of new models that can be trained and updated quickly, using less memory and power. The result is Aaren, a module that’s both fast and accurate. It works just as well as the original Transformers on many tasks, but it’s much more efficient.

Keywords

» Artificial intelligence  » Attention  » Classification  » Inference  » Neural network  » Reinforcement learning  » Rnn  » Time series