Summary of Learning From Response Not Preference: a Stackelberg Approach For Llm Detoxification Using Non-parallel Data, by Xinhong Xie et al.

Learning from Response not Preference: A Stackelberg Approach for LLM Detoxification using Non-parallel Data

by Xinhong Xie, Tao Li, Quanyan Zhu

First submitted to arxiv on: 27 Oct 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The proposed fine-tuning method utilizes non-parallel data to transform large language models into a detoxification rewritter. By modeling the process as a Stackelberg game between the LLM and a toxicity screener, the paper develops a solution to address incomplete preference, a primary challenge in non-parallel data fine-tuning. The Stackelberg response optimization (SRO) method is introduced, which adapts direct preference optimization to enable the LLM to learn from the follower’s response. Experimental results demonstrate that SRO-fine-tuned LLMs achieve comparable performance to state-of-the-art models regarding style accuracy, content similarity, and fluency, while surpassing other computing methods in detoxification performance.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This paper uses big language models to help clean up online social media by rewriting toxic text into non-toxic text. The problem is that these models need to learn from examples that are not exactly like the text they’re trying to rewrite. To solve this, the researchers came up with a new way to fine-tune the models using a game-like approach where one player tries to follow the rules set by another player. This helps the model learn how to generate better rewritten text that passes a test for toxicity. The results show that this method works well and is comparable to other methods, even beating them in some cases.

Keywords

» Artificial intelligence » Fine tuning » Optimization

Learning from Response not Preference: A Stackelberg Approach for LLM Detoxification using Non-parallel Data

by Xinhong Xie, Tao Li, Quanyan Zhu

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Scube: Instant Large-scale Scene Reconstruction Using Voxsplats, by Xuanchi Ren et al.

Summary of Rethinking Data Synthesis: a Teacher Model Training Recipe with Interpretation, by Yifang Chen et al.

Related Posts