Summary of On Effects Of Steering Latent Representation For Large Language Model Unlearning, by Dang Huu-tien et al.

On Effects of Steering Latent Representation for Large Language Model Unlearning

by Dang Huu-Tien, Trung-Tin Pham, Hoang Thanh-Tung, Naoya Inoue

First submitted to arxiv on: 12 Aug 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary This paper presents Representation Misdirection for Unlearning (RMU), a technique that effectively unlearns large language models (LLMs) by steering their intermediate layer representations towards random targets. The authors theoretically demonstrate that this process reduces token confidence, leading to the generation of incorrect or nonsensical responses. They also investigate how the coefficient influences the alignment of forget-sample representations with the random direction and suggest optimal values for effective unlearning across different network layers. Notably, RMU is shown to be robust against adversarial jailbreak attacks. While the authors find that applying RMU to middle and later layers in LLMs is less effective, they propose Adaptive RMU, a simple yet effective alternative method that improves unlearning performance while incurring no additional computational cost.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This paper talks about how to “forget” what large language models have learned. They use a technique called Representation Misdirection for Unlearning (RMU) to make the models behave like they’re not sure what’s going on anymore. This helps them avoid giving wrong or silly answers. The authors also find that this method is good at keeping the models from being tricked by fake information. However, it doesn’t work as well when used in later parts of the model. To fix this, they suggest a simpler way to unlearn called Adaptive RMU.

Keywords

* Artificial intelligence * Alignment * Token

On Effects of Steering Latent Representation for Large Language Model Unlearning

by Dang Huu-Tien, Trung-Tin Pham, Hoang Thanh-Tung, Naoya Inoue

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Dynamic Blocked Clause Elimination For Projected Model Counting, by Jean-marie Lagniez et al.

Summary of Difflora: Generating Personalized Low-rank Adaptation Weights with Diffusion, by Yujia Wu et al.

Related Posts