Loading Now

Summary of Provable Unlearning in Topic Modeling and Downstream Tasks, by Stanley Wei et al.


Provable unlearning in topic modeling and downstream tasks

by Stanley Wei, Sadhika Malladi, Sanjeev Arora, Amartya Sanyal

First submitted to arxiv on: 19 Nov 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
Machine learning educators will appreciate this paper’s contribution to verifying the success of machine unlearning algorithms. The authors provide theoretical guarantees for unlearning in pre-training and fine-tuning settings using topic models and simple bag-of-words language models adapted for tasks like retrieval and classification. A provably effective unlearning algorithm is designed, which incurs a computational overhead independent of the original dataset size. Quantification of deletion capacity, or the number of examples that can be unlearned without significant performance cost, is also achieved. The authors formally extend their analysis to account for adaptation to a given downstream task. A novel efficient algorithm is designed for unlearning after fine-tuning topic models via linear heads. Notably, it is shown that pre-training data can be easier to unlearn from models fine-tuned to specific tasks without modifying the base model.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper helps make sure that machine learning algorithms don’t use sensitive training data in ways that are not allowed. The authors created a new way to remove old information from language models so they only use what’s necessary for their current task. They tested this on topic models, which can be used for tasks like searching and classifying text. The algorithm works independently of how big the original dataset was. The authors also figured out how much data can be removed without harming the model’s performance. This is important because it shows that pre-trained language models can be easily adapted to new tasks without losing their old abilities.

Keywords

» Artificial intelligence  » Bag of words  » Classification  » Fine tuning  » Machine learning