Summary of Learning From Response Not Preference: a Stackelberg Approach For Llm Detoxification Using Non-parallel Data, by Xinhong Xie et al.
Learning from Response not Preference: A Stackelberg Approach for LLM Detoxification using Non-parallel Data
by Xinhong Xie, Tao Li, Quanyan Zhu
First submitted to arxiv on: 27 Oct 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The proposed fine-tuning method utilizes non-parallel data to transform large language models into a detoxification rewritter. By modeling the process as a Stackelberg game between the LLM and a toxicity screener, the paper develops a solution to address incomplete preference, a primary challenge in non-parallel data fine-tuning. The Stackelberg response optimization (SRO) method is introduced, which adapts direct preference optimization to enable the LLM to learn from the follower’s response. Experimental results demonstrate that SRO-fine-tuned LLMs achieve comparable performance to state-of-the-art models regarding style accuracy, content similarity, and fluency, while surpassing other computing methods in detoxification performance. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper uses big language models to help clean up online social media by rewriting toxic text into non-toxic text. The problem is that these models need to learn from examples that are not exactly like the text they’re trying to rewrite. To solve this, the researchers came up with a new way to fine-tune the models using a game-like approach where one player tries to follow the rules set by another player. This helps the model learn how to generate better rewritten text that passes a test for toxicity. The results show that this method works well and is comparable to other methods, even beating them in some cases. |
Keywords
» Artificial intelligence » Fine tuning » Optimization