Summary of Mitigating Social Biases in Language Models Through Unlearning, by Omkar Dige et al.
Mitigating Social Biases in Language Models through Unlearning
by Omkar Dige, Diljot Singh, Tsz Fung Yau, Qixuan Zhang, Borna Bolandraftar, Xiaodan Zhu, Faiza Khan Khattak
First submitted to arxiv on: 19 Jun 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary Mitigating bias in language models (LMs) is crucial due to their widespread deployment. Existing approaches involve data pre-processing and fine-tuning, which can be time-consuming and computationally demanding. Machine unlearning techniques have gained attention for inducing the forgetting of undesired behaviors in existing models at a lower computational cost. This work explores two unlearning methods: Partitioned Contrastive Gradient Unlearning (PCGU) applied to decoder models and Negation via Task Vector. These methods aim to reduce social biases in state-of-the-art LMs like LLaMA-2 and OPT, including distributed PCGU for large models. The results show that the Negation via Task Vector method outperforms PCGU in debiasing with minimal deterioration in performance and perplexity. On LLaMA-27B, this method reduces bias by 11.8%. These findings demonstrate the potential of machine unlearning techniques to mitigate biases in language models. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This research paper is about making language models fairer. Right now, many language models have biases built into them, which can be harmful. To fix this problem, scientists are trying new ways to “unlearn” the biased behaviors. They’re experimenting with two methods: one that affects how words are arranged in sentences and another that changes what tasks the model is given. They tested these methods on different language models and found that one method worked better than the other at reducing biases without losing its ability to understand text. This discovery could help make language models more fair and accurate. |
Keywords
» Artificial intelligence » Attention » Decoder » Fine tuning » Llama » Perplexity