Summary of Disentangling Length From Quality in Direct Preference Optimization, by Ryan Park and Rafael Rafailov and Stefano Ermon and Chelsea Finn
Disentangling Length from Quality in Direct Preference Optimization
by Ryan Park, Rafael Rafailov, Stefano Ermon, Chelsea Finn
First submitted to arxiv on: 28 Mar 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary In this paper, researchers investigate ways to prevent Large Language Models from exploiting biases in human feedback. Specifically, they focus on Direct Preference Optimization (DPO), a type of algorithm that is prone to producing verbose and objective answers even when they are not helpful. The authors show that DPO models can be tricked into thinking longer responses are better than shorter ones, which leads to poor performance. To address this issue, the researchers develop a regularization strategy that prevents length exploitation while still improving model quality. This approach is demonstrated on summarization and dialogue datasets, achieving up to 20% improvement in win rates. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper looks at how language models can be trained using human feedback, but it also highlights a problem with this approach. The models can learn to produce longer answers that are not necessarily better just because humans like them more. This is bad because it means the models might not give the best answer even if it’s shorter and actually helpful. To solve this issue, the researchers come up with a new way of training the models that prevents this from happening. |
Keywords
* Artificial intelligence * Optimization * Regularization * Summarization