Loading Now

Summary of Aligning Crowd Feedback Via Distributional Preference Reward Modeling, by Dexun Li et al.


Aligning Crowd Feedback via Distributional Preference Reward Modeling

by Dexun Li, Cong Zhang, Kuicai Dong, Derrick Goh Xin Deik, Ruiming Tang, Yong Liu

First submitted to arxiv on: 15 Feb 2024

Categories

  • Main: Artificial Intelligence (cs.AI)
  • Secondary: None

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
Deep reinforcement learning is a crucial technique for aligning large language models (LLMs) with human preferences. However, traditional reward modeling heavily relies on human annotations provided by a select group of individuals. This dependence can inadvertently lead to biased models that reflect the inclinations of these annotators rather than the wider population’s expectations. To address this issue, we propose the Distributional Preference Reward Model (DPRM), a framework that characterizes multiple preferences using a categorical distribution and incorporates a Bayesian updater to adapt to changing or new preferences. Additionally, we develop an optimal-transportation-based loss function to calibrate DPRM to align with the preference distribution. Finally, we utilize expected rewards to fine-tune LLM policies to generate responses favored by the population. Our experiments demonstrate that DPRM significantly enhances the alignment of LLMs with population preference, resulting in more accurate, unbiased, and contextually appropriate responses.
Low GrooveSquid.com (original content) Low Difficulty Summary
Imagine a machine learning model that can understand what people like and dislike. But right now, these models are often biased because they’re trained using data from just a few people. This isn’t fair to the rest of us! To fix this problem, we created a new way to train these models using a “preference reward” system. It’s like a game where the model tries different responses and gets rewarded when it’s doing something that people will like. We tested our system with large language models (LLMs) and found that it works really well! Our LLMs are now more accurate, fair, and contextually appropriate in their responses.

Keywords

» Artificial intelligence  » Alignment  » Loss function  » Machine learning  » Reinforcement learning