Summary of Geometric-averaged Preference Optimization For Soft Preference Labels, by Hiroki Furuta et al.
Geometric-Averaged Preference Optimization for Soft Preference Labels
by Hiroki Furuta, Kuang-Huei Lee, Shixiang Shane Gu, Yutaka Matsuo, Aleksandra Faust, Heiga Zen, Izzeddin Gur
First submitted to arxiv on: 10 Sep 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper addresses a limitation in current methods for aligning large language models (LLMs) with human preferences by representing these preferences distributionally rather than assuming they are binary and deterministic. The authors introduce “distributional soft preference labels” and modify Direct Preference Optimization (DPO) to incorporate these labels into the loss function using a weighted geometric average of the LLM output likelihood. This approach adjusts the learning loss based on the soft labels, which allows the model to learn from responses that are closer to equally preferred. The authors demonstrate the effectiveness of this method in experiments simulating AI feedback from LLMs and achieving improved performance on standard benchmarks for alignment research. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper helps us understand how computers can better understand what people like and dislike. Right now, many computer programs assume that people’s preferences are simple “yes” or “no” answers. But people have different opinions, and these opinions should be represented in a way that shows they’re not just one way or the other. The authors of this paper came up with a new idea to make computer models better understand what people like by representing their preferences as a range instead of just two options. They tested this method and found it worked really well, especially when people were only slightly sure about their opinions. |
Keywords
» Artificial intelligence » Alignment » Likelihood » Loss function » Optimization