Summary of Aligning Crowd Feedback Via Distributional Preference Reward Modeling, by Dexun Li et al.
Aligning Crowd Feedback via Distributional Preference Reward Modeling
by Dexun Li, Cong Zhang, Kuicai Dong, Derrick Goh Xin Deik, Ruiming Tang, Yong Liu
First submitted to arxiv on: 15 Feb 2024
Categories
- Main: Artificial Intelligence (cs.AI)
- Secondary: None
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary Deep reinforcement learning is a crucial technique for aligning large language models (LLMs) with human preferences. However, traditional reward modeling heavily relies on human annotations provided by a select group of individuals. This dependence can inadvertently lead to biased models that reflect the inclinations of these annotators rather than the wider population’s expectations. To address this issue, we propose the Distributional Preference Reward Model (DPRM), a framework that characterizes multiple preferences using a categorical distribution and incorporates a Bayesian updater to adapt to changing or new preferences. Additionally, we develop an optimal-transportation-based loss function to calibrate DPRM to align with the preference distribution. Finally, we utilize expected rewards to fine-tune LLM policies to generate responses favored by the population. Our experiments demonstrate that DPRM significantly enhances the alignment of LLMs with population preference, resulting in more accurate, unbiased, and contextually appropriate responses. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Imagine a machine learning model that can understand what people like and dislike. But right now, these models are often biased because they’re trained using data from just a few people. This isn’t fair to the rest of us! To fix this problem, we created a new way to train these models using a “preference reward” system. It’s like a game where the model tries different responses and gets rewarded when it’s doing something that people will like. We tested our system with large language models (LLMs) and found that it works really well! Our LLMs are now more accurate, fair, and contextually appropriate in their responses. |
Keywords
» Artificial intelligence » Alignment » Loss function » Machine learning » Reinforcement learning