Loading Now

Summary of Abrupt Learning in Transformers: a Case Study on Matrix Completion, by Pulkit Gopalani et al.


Abrupt Learning in Transformers: A Case Study on Matrix Completion

by Pulkit Gopalani, Ekdeep Singh Lubana, Wei Hu

First submitted to arxiv on: 29 Oct 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Machine Learning (stat.ML)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The paper investigates the training dynamics of Transformers, which exhibit a characteristic plateau in training loss followed by a sharp drop to near-optimal values. To understand this phenomenon, the authors formulate the low-rank matrix completion problem as a masked language modeling (MLM) task and train a BERT model to solve it. The results show that the model’s predictions, attention heads, and hidden states undergo significant changes before and after the loss drop, indicating a transition from simple copying to accurate prediction, interpretable attention patterns, and relevant encoding of problem information.
Low GrooveSquid.com (original content) Low Difficulty Summary
The paper looks at how Transformers learn during training and finds an interesting pattern. It seems that the model stops improving for a while, then suddenly gets much better without any changes in the way it’s trained. The authors tried to understand this by giving the model a special task to do, which is called low-rank matrix completion. They found that the model starts off just copying what it sees, but then it learns to predict things correctly. This change is also reflected in how the model focuses its attention and what it stores in its internal memory.

Keywords

» Artificial intelligence  » Attention  » Bert