Summary of Maxmin-rlhf: Alignment with Diverse Human Preferences, by Souradip Chakraborty et al.

MaxMin-RLHF: Alignment with Diverse Human Preferences

by Souradip Chakraborty, Jiahao Qiu, Hui Yuan, Alec Koppel, Furong Huang, Dinesh Manocha, Amrit Singh Bedi, Mengdi Wang

First submitted to arxiv on: 14 Feb 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary This paper addresses the limitations of Reinforcement Learning from Human Feedback (RLHF) by introducing a novel approach to better represent diverse human preferences. The authors demonstrate that single-reward RLHF is insufficient for capturing this diversity, leading to an impossibility result. To overcome this limitation, they propose a MaxMin alignment objective and learn a mixture of preference distributions via an expectation-maximization algorithm. This approach is connected to distributionally robust optimization and general utility RL, showcasing its generality and robustness. Experimental results on small-scale (GPT-2) and large-scale language models (Tulu2-7B) demonstrate the efficacy of the proposed method in improving win-rates by over 16% compared to conventional RLHF algorithms, while also achieving better performance for minority groups without compromising majority group performance.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This paper is about making computers learn from people’s preferences. Right now, there are only a few ways that computers can learn what we like or dislike. But humans have different tastes and opinions, so it’s not fair to make one approach work for everyone. The researchers found that the current way of doing this (called RLHF) doesn’t take into account these differences. To fix this problem, they came up with a new method that combines lots of people’s preferences together. This new method works better and is more fair. It even does better for people who don’t have as much influence in making decisions.

Keywords

* Artificial intelligence * Alignment * Gpt * Optimization * Reinforcement learning from human feedback * Rlhf

MaxMin-RLHF: Alignment with Diverse Human Preferences

by Souradip Chakraborty, Jiahao Qiu, Hui Yuan, Alec Koppel, Furong Huang, Dinesh Manocha, Amrit Singh Bedi, Mengdi Wang

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Moving Object Proposals with Deep Learned Optical Flow For Video Object Segmentation, by Ge Shi and Zhili Yang

Summary of Gradient Alignment with Prototype Feature For Fully Test-time Adaptation, by Juhyeon Shin and Jonghyun Lee and Saehyung Lee and Minjun Park and Dongjun Lee and Uiwon Hwang and Sungroh Yoon

Related Posts