Summary of Preference Learning Algorithms Do Not Learn Preference Rankings, by Angelica Chen et al.

Preference Learning Algorithms Do Not Learn Preference Rankings

by Angelica Chen, Sadhika Malladi, Lily H. Zhang, Xinyi Chen, Qiuyi Zhang, Rajesh Ranganath, Kyunghyun Cho

First submitted to arxiv on: 29 May 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary This research investigates the effectiveness of preference learning algorithms in steering language models to produce preferred outputs. Specifically, it examines whether these algorithms train models to assign higher likelihoods to more preferred outputs than less preferred ones, as measured by ranking accuracy. Surprisingly, the study finds that most state-of-the-art preference-tuned models achieve a ranking accuracy of less than 60% on common datasets. The researchers also derive an idealized ranking accuracy for perfect optimization and demonstrate a significant alignment gap between observed and idealized rankings. They attribute this gap to the DPO objective’s limitations in fixing mild ranking errors. Additionally, they propose a simple formula for quantifying the difficulty of learning a given preference datapoint.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This study looks at how well language models can be trained to produce preferred outputs using certain algorithms. It found that most good models don’t actually get very good at this – only about 60% accurate on typical tests. The researchers also figured out what would happen if these models were perfect, and it turns out they’re missing the mark by a lot! They think this is because of how one of those algorithms works. Overall, this study helps us understand what’s going on when we try to teach language models what humans like.

Keywords

» Artificial intelligence » Alignment » Optimization

Preference Learning Algorithms Do Not Learn Preference Rankings

by Angelica Chen, Sadhika Malladi, Lily H. Zhang, Xinyi Chen, Qiuyi Zhang, Rajesh Ranganath, Kyunghyun Cho

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Decentralized Optimization in Time-varying Networks with Arbitrary Delays, by Tomas Ortega et al.

Summary of Towards Deeper Understanding Of Ppr-based Embedding Approaches: a Topological Perspective, by Xingyi Zhang et al.

Related Posts