Summary of What Will My Model Forget? Forecasting Forgotten Examples in Language Model Refinement, by Xisen Jin et al.
What Will My Model Forget? Forecasting Forgotten Examples in Language Model Refinement
by Xisen Jin, Xiang Ren
First submitted to arxiv on: 2 Feb 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Computation and Language (cs.CL); Machine Learning (stat.ML)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This research paper proposes innovative solutions to mitigate catastrophic forgetting in language models deployed in the wild. The authors observe that updating the model with corrected error instances leads to catastrophic forgetting, where the updated model makes errors on previously learned instances. To address this issue, they develop forecasting models that predict which upstream pre-training examples will be forgotten due to a model update. They train these forecasting models using online learned examples and corresponding forgotten upstream pre-training examples. The authors propose two forecasting models: a partially interpretable model based on pre-softmax logit scores and a black-box classifier based on inner products of example representations. Experimental results show that the black-box classifier outperforms the partially interpretable model on various setups, including BART and T5 models. By replaying examples forecasted to be forgotten, the authors demonstrate the practical utility of their approach in reducing forgetting of upstream pre-training examples. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper solves a big problem with language models that makes mistakes when they’re updated. When you update a model to fix its mistakes, it often forgets what it learned before! To solve this, scientists developed special forecasting models that predict which old examples will be forgotten because of the update. They tested these models on two types of language models, BART and T5, and found that one approach worked better than the other. By using these forecasts to replay old examples, they showed that it’s possible to reduce forgetting and make language models more reliable. |
Keywords
* Artificial intelligence * Softmax * T5