Loading Now

Summary of Transformers For Supervised Online Continual Learning, by Jorg Bornschein et al.


Transformers for Supervised Online Continual Learning

by Jorg Bornschein, Yazhe Li, Amal Rannen-Triki

First submitted to arxiv on: 3 Mar 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: None

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
Transformers have become the dominant architecture for sequence modeling tasks such as natural language processing or audio processing, and they are now even considered for tasks that are not naturally sequential such as image classification. Their ability to attend to and process a set of tokens as context enables them to develop in-context few-shot learning abilities. To address their potential for online continual learning, we propose a method that leverages the strengths of transformers for online adaptation. Our approach explicitly conditions a transformer on recent observations, while at the same time online training it with stochastic gradient descent following Transformer-XL’s procedure. We incorporate replay to maintain the benefits of multi-epoch training while adhering to the sequential protocol. Our method demonstrates significant improvements over previous state-of-the-art results on CLOC, a challenging large-scale real-world benchmark for image geo-localization.
Low GrooveSquid.com (original content) Low Difficulty Summary
Transformers are super smart models that can learn from small amounts of data. They’re great at learning new things quickly, but they haven’t been very good at adapting to changing information over time. In this paper, we try to fix that by creating a new way for transformers to learn from new information as it comes in. We do this by having the model focus on recent observations and then adjusting its understanding based on those observations. This helps the model adapt quickly to changes in the data. We tested our approach on a real-world problem called image geo-localization, where we need to identify where an image was taken. Our method did much better than previous approaches, which is really exciting!

Keywords

* Artificial intelligence  * Continual learning  * Few shot  * Image classification  * Natural language processing  * Stochastic gradient descent  * Transformer