Loading Now

Summary of Preference Fine-tuning Of Llms Should Leverage Suboptimal, On-policy Data, by Fahim Tajwar et al.


Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data

by Fahim Tajwar, Anikait Singh, Archit Sharma, Rafael Rafailov, Jeff Schneider, Tengyang Xie, Stefano Ermon, Chelsea Finn, Aviral Kumar

First submitted to arxiv on: 22 Apr 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: None

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The paper investigates the importance of different approaches for fine-tuning large language models (LLMs) using preference labels. It compares various methods, including supervised learning, on-policy reinforcement learning (RL), and contrastive learning, which have distinct implementation tradeoffs and performance differences. The study finds that techniques employing on-policy sampling or pushing down the likelihood on certain responses (negative gradient) outperform offline and maximum likelihood objectives. These findings unify methods under a notion of mode-seeking objectives for categorical distributions, which can rapidly relocate probability mass across bins.
Low GrooveSquid.com (original content) Low Difficulty Summary
Large language models can be fine-tuned using preference labels to make them more accurate. There are different ways to do this, like teaching the model or letting it learn on its own while making choices. The best approach depends on the situation and what kind of data is being used. Researchers studied several methods to find out which ones work well for fine-tuning large language models. They found that some techniques are better than others at making the model more accurate.

Keywords

» Artificial intelligence  » Fine tuning  » Likelihood  » Probability  » Reinforcement learning  » Supervised