Summary of Active Preference Learning For Large Language Models, by William Muldrew et al.
Active Preference Learning for Large Language Models
by William Muldrew, Peter Hayes, Mingtian Zhang, David Barber
First submitted to arxiv on: 12 Feb 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper focuses on developing more effective techniques for aligning large language models (LLMs) with human intent, particularly when using LLMs as oracles. The current state-of-the-art method is Reinforcement Learning from Human or AI Preferences (RLHF/RLAIF), which can be complex and unstable. To address this issue, the authors propose Direct Preference Optimization (DPO) as a simpler and more stable alternative. They then develop an active learning strategy for DPO to make better use of preference labels, proposing a practical acquisition function based on predictive entropy and certainty measures. The approach is demonstrated to improve both the rate of learning and final performance of fine-tuning on pairwise preference data. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Large language models are getting better at understanding us, but we need ways to help them understand what we mean. This paper looks at how to make large language models work with people better. There’s a way called Reinforcement Learning from Human or AI Preferences that helps, but it can be tricky and doesn’t always work well. The authors of this paper think there might be an easier way called Direct Preference Optimization. They’re trying to figure out how to use this new method more effectively by asking the right questions about what people prefer. This approach seems to help make the language models better at understanding what we mean, both as they learn and in the end. |
Keywords
* Artificial intelligence * Active learning * Fine tuning * Optimization * Reinforcement learning * Rlhf