Loading Now

Summary of Mitigating Social Biases in Language Models Through Unlearning, by Omkar Dige et al.


Mitigating Social Biases in Language Models through Unlearning

by Omkar Dige, Diljot Singh, Tsz Fung Yau, Qixuan Zhang, Borna Bolandraftar, Xiaodan Zhu, Faiza Khan Khattak

First submitted to arxiv on: 19 Jun 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
Mitigating bias in language models (LMs) is crucial due to their widespread deployment. Existing approaches involve data pre-processing and fine-tuning, which can be time-consuming and computationally demanding. Machine unlearning techniques have gained attention for inducing the forgetting of undesired behaviors in existing models at a lower computational cost. This work explores two unlearning methods: Partitioned Contrastive Gradient Unlearning (PCGU) applied to decoder models and Negation via Task Vector. These methods aim to reduce social biases in state-of-the-art LMs like LLaMA-2 and OPT, including distributed PCGU for large models. The results show that the Negation via Task Vector method outperforms PCGU in debiasing with minimal deterioration in performance and perplexity. On LLaMA-27B, this method reduces bias by 11.8%. These findings demonstrate the potential of machine unlearning techniques to mitigate biases in language models.
Low GrooveSquid.com (original content) Low Difficulty Summary
This research paper is about making language models fairer. Right now, many language models have biases built into them, which can be harmful. To fix this problem, scientists are trying new ways to “unlearn” the biased behaviors. They’re experimenting with two methods: one that affects how words are arranged in sentences and another that changes what tasks the model is given. They tested these methods on different language models and found that one method worked better than the other at reducing biases without losing its ability to understand text. This discovery could help make language models more fair and accurate.

Keywords

» Artificial intelligence  » Attention  » Decoder  » Fine tuning  » Llama  » Perplexity