Summary of Mitigating Social Biases in Language Models Through Unlearning, by Omkar Dige et al.

by Omkar Dige, Diljot Singh, Tsz Fung Yau, Qixuan Zhang, Borna Bolandraftar, Xiaodan Zhu, Faiza Khan Khattak

First submitted to arxiv on: 19 Jun 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary Mitigating bias in language models (LMs) is crucial due to their widespread deployment. Existing approaches involve data pre-processing and fine-tuning, which can be time-consuming and computationally demanding. Machine unlearning techniques have gained attention for inducing the forgetting of undesired behaviors in existing models at a lower computational cost. This work explores two unlearning methods: Partitioned Contrastive Gradient Unlearning (PCGU) applied to decoder models and Negation via Task Vector. These methods aim to reduce social biases in state-of-the-art LMs like LLaMA-2 and OPT, including distributed PCGU for large models. The results show that the Negation via Task Vector method outperforms PCGU in debiasing with minimal deterioration in performance and perplexity. On LLaMA-27B, this method reduces bias by 11.8%. These findings demonstrate the potential of machine unlearning techniques to mitigate biases in language models.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This research paper is about making language models fairer. Right now, many language models have biases built into them, which can be harmful. To fix this problem, scientists are trying new ways to “unlearn” the biased behaviors. They’re experimenting with two methods: one that affects how words are arranged in sentences and another that changes what tasks the model is given. They tested these methods on different language models and found that one method worked better than the other at reducing biases without losing its ability to understand text. This discovery could help make language models more fair and accurate.

Keywords

» Artificial intelligence » Attention » Decoder » Fine tuning » Llama » Perplexity

Mitigating Social Biases in Language Models through Unlearning

by Omkar Dige, Diljot Singh, Tsz Fung Yau, Qixuan Zhang, Borna Bolandraftar, Xiaodan Zhu, Faiza Khan Khattak

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Investigating Low-cost Llm Annotation For~spoken Dialogue Understanding Datasets, by Lucas Druart (lia) et al.

Summary of Stability and Generalizability in Sde Diffusion Models with Measure-preserving Dynamics, by Weitong Zhang et al.

Related Posts