Summary of Reinforcement Learning From Human Feedback with Active Queries, by Kaixuan Ji and Jiafan He and Quanquan Gu
Reinforcement Learning from Human Feedback with Active Queries
by Kaixuan Ji, Jiafan He, Quanquan Gu
First submitted to arxiv on: 14 Feb 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Optimization and Control (math.OC); Machine Learning (stat.ML)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper proposes query-efficient reinforcement learning from human feedback (RLHF) methods to align large language models with human preferences. Current RLHF approaches require a significant amount of human-labelled data, which is expensive to collect. The authors formalize the alignment problem as a contextual dueling bandit problem and design an active-query-based proximal policy optimization (APPO) algorithm with instance-dependent regret bounds and query complexity. They also propose ADPO, a practical version of APPO based on direct preference optimization (DPO), and apply it to fine-tuning large language models. The results show that ADPO matches the performance of state-of-the-art DPO methods while making only half as many queries for human preference. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper is about teaching computers to understand what humans like or dislike. Right now, we need a lot of labeled data from humans to make this happen, which can be expensive and time-consuming. The researchers came up with a new way to do this that’s more efficient and effective. They framed the problem as a game where the computer tries different things and gets feedback from humans to see what they like best. This helps the computer learn faster and make better choices. They tested their method on large language models and found it worked just as well as other methods, but took less time and effort. |
Keywords
* Artificial intelligence * Alignment * Fine tuning * Optimization * Reinforcement learning from human feedback * Rlhf