Summary of Unforgettable Generalization in Language Models, by Eric Zhang et al.
Unforgettable Generalization in Language Models
by Eric Zhang, Leshem Chosen, Jacob Andreas
First submitted to arxiv on: 3 Sep 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Computation and Language (cs.CL)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary A study investigates how language models (LMs) change their behavior when they are trained to “unlearn” a skill, specifically through fine-tuning on randomized labels. The research explores transformer LMs that have been forgotten via this process and finds that their predictions exhibit extreme variability depending on the task. In some cases, such as entailment classification, forgetting generalizes robustly, leading to uninformative predictions on new instances. However, in other tasks like physical commonsense reasoning and scientific question answering, forgetting only affects training examples, and models continue to perform accurately even for similar instances. The study identifies dataset difficulty as not predictive of forgettable behaviors and instead finds that initial task confidence and representation variability are weakly associated with generalization. Surprisingly, random-label forgetting is insensitive to the training set’s contents, with models trained on science questions performing well on other tasks but producing random labels on entailment classification. The results show that even generalizable forgetting is shallow, allowing linear probes to perform tasks reliably after forgetting. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Language models are like super smart computers that can learn and do lots of things! But sometimes we want them to “forget” something they know so they don’t use it anymore. This study looked at how language models change when they’re trained to forget a skill, specifically through fine-tuning on random labels. They found out that the models behave differently depending on what task they were trying to forget. Sometimes, the models will just make random guesses, but other times they’ll still do well even if they don’t know the answer anymore! It’s like their brains are saying “oh yeah, I remember this!” or “hmm, I’m not sure about that one!” |
Keywords
» Artificial intelligence » Classification » Fine tuning » Generalization » Question answering » Transformer