Loading Now

Summary of Learning From Response Not Preference: a Stackelberg Approach For Llm Detoxification Using Non-parallel Data, by Xinhong Xie et al.


Learning from Response not Preference: A Stackelberg Approach for LLM Detoxification using Non-parallel Data

by Xinhong Xie, Tao Li, Quanyan Zhu

First submitted to arxiv on: 27 Oct 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The proposed fine-tuning method utilizes non-parallel data to transform large language models into a detoxification rewritter. By modeling the process as a Stackelberg game between the LLM and a toxicity screener, the paper develops a solution to address incomplete preference, a primary challenge in non-parallel data fine-tuning. The Stackelberg response optimization (SRO) method is introduced, which adapts direct preference optimization to enable the LLM to learn from the follower’s response. Experimental results demonstrate that SRO-fine-tuned LLMs achieve comparable performance to state-of-the-art models regarding style accuracy, content similarity, and fluency, while surpassing other computing methods in detoxification performance.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper uses big language models to help clean up online social media by rewriting toxic text into non-toxic text. The problem is that these models need to learn from examples that are not exactly like the text they’re trying to rewrite. To solve this, the researchers came up with a new way to fine-tune the models using a game-like approach where one player tries to follow the rules set by another player. This helps the model learn how to generate better rewritten text that passes a test for toxicity. The results show that this method works well and is comparable to other methods, even beating them in some cases.

Keywords

» Artificial intelligence  » Fine tuning  » Optimization