Loading Now

Summary of Active Preference Learning For Large Language Models, by William Muldrew et al.


Active Preference Learning for Large Language Models

by William Muldrew, Peter Hayes, Mingtian Zhang, David Barber

First submitted to arxiv on: 12 Feb 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper focuses on developing more effective techniques for aligning large language models (LLMs) with human intent, particularly when using LLMs as oracles. The current state-of-the-art method is Reinforcement Learning from Human or AI Preferences (RLHF/RLAIF), which can be complex and unstable. To address this issue, the authors propose Direct Preference Optimization (DPO) as a simpler and more stable alternative. They then develop an active learning strategy for DPO to make better use of preference labels, proposing a practical acquisition function based on predictive entropy and certainty measures. The approach is demonstrated to improve both the rate of learning and final performance of fine-tuning on pairwise preference data.
Low GrooveSquid.com (original content) Low Difficulty Summary
Large language models are getting better at understanding us, but we need ways to help them understand what we mean. This paper looks at how to make large language models work with people better. There’s a way called Reinforcement Learning from Human or AI Preferences that helps, but it can be tricky and doesn’t always work well. The authors of this paper think there might be an easier way called Direct Preference Optimization. They’re trying to figure out how to use this new method more effectively by asking the right questions about what people prefer. This approach seems to help make the language models better at understanding what we mean, both as they learn and in the end.

Keywords

* Artificial intelligence  * Active learning  * Fine tuning  * Optimization  * Reinforcement learning  * Rlhf