Loading Now

Summary of On Effects Of Steering Latent Representation For Large Language Model Unlearning, by Dang Huu-tien et al.


On Effects of Steering Latent Representation for Large Language Model Unlearning

by Dang Huu-Tien, Trung-Tin Pham, Hoang Thanh-Tung, Naoya Inoue

First submitted to arxiv on: 12 Aug 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper presents Representation Misdirection for Unlearning (RMU), a technique that effectively unlearns large language models (LLMs) by steering their intermediate layer representations towards random targets. The authors theoretically demonstrate that this process reduces token confidence, leading to the generation of incorrect or nonsensical responses. They also investigate how the coefficient influences the alignment of forget-sample representations with the random direction and suggest optimal values for effective unlearning across different network layers. Notably, RMU is shown to be robust against adversarial jailbreak attacks. While the authors find that applying RMU to middle and later layers in LLMs is less effective, they propose Adaptive RMU, a simple yet effective alternative method that improves unlearning performance while incurring no additional computational cost.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper talks about how to “forget” what large language models have learned. They use a technique called Representation Misdirection for Unlearning (RMU) to make the models behave like they’re not sure what’s going on anymore. This helps them avoid giving wrong or silly answers. The authors also find that this method is good at keeping the models from being tricked by fake information. However, it doesn’t work as well when used in later parts of the model. To fix this, they suggest a simpler way to unlearn called Adaptive RMU.

Keywords

* Artificial intelligence  * Alignment  * Token