Summary of Aligning Crowd Feedback Via Distributional Preference Reward Modeling, by Dexun Li et al.

Aligning Crowd Feedback via Distributional Preference Reward Modeling

by Dexun Li, Cong Zhang, Kuicai Dong, Derrick Goh Xin Deik, Ruiming Tang, Yong Liu

First submitted to arxiv on: 15 Feb 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary Deep reinforcement learning is a crucial technique for aligning large language models (LLMs) with human preferences. However, traditional reward modeling heavily relies on human annotations provided by a select group of individuals. This dependence can inadvertently lead to biased models that reflect the inclinations of these annotators rather than the wider population’s expectations. To address this issue, we propose the Distributional Preference Reward Model (DPRM), a framework that characterizes multiple preferences using a categorical distribution and incorporates a Bayesian updater to adapt to changing or new preferences. Additionally, we develop an optimal-transportation-based loss function to calibrate DPRM to align with the preference distribution. Finally, we utilize expected rewards to fine-tune LLM policies to generate responses favored by the population. Our experiments demonstrate that DPRM significantly enhances the alignment of LLMs with population preference, resulting in more accurate, unbiased, and contextually appropriate responses.
Low	GrooveSquid.com (original content)	Low Difficulty Summary Imagine a machine learning model that can understand what people like and dislike. But right now, these models are often biased because they’re trained using data from just a few people. This isn’t fair to the rest of us! To fix this problem, we created a new way to train these models using a “preference reward” system. It’s like a game where the model tries different responses and gets rewarded when it’s doing something that people will like. We tested our system with large language models (LLMs) and found that it works really well! Our LLMs are now more accurate, fair, and contextually appropriate in their responses.

Keywords

» Artificial intelligence » Alignment » Loss function » Machine learning » Reinforcement learning

Aligning Crowd Feedback via Distributional Preference Reward Modeling

by Dexun Li, Cong Zhang, Kuicai Dong, Derrick Goh Xin Deik, Ruiming Tang, Yong Liu

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Medical Image Segmentation with Intent: Integrated Entropy Weighting For Single Image Test-time Adaptation, by Haoyu Dong and Nicholas Konz and Hanxue Gu and Maciej A. Mazurowski

Summary of Paying Attention to Deflections: Mining Pragmatic Nuances For Whataboutism Detection in Online Discourse, by Khiem Phi et al.

Related Posts