Loading Now

Summary of Helpsteer2-preference: Complementing Ratings with Preferences, by Zhilin Wang et al.


HelpSteer2-Preference: Complementing Ratings with Preferences

by Zhilin Wang, Alexander Bukharin, Olivier Delalleau, Daniel Egert, Gerald Shen, Jiaqi Zeng, Oleksii Kuchaiev, Yi Dong

First submitted to arxiv on: 2 Oct 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
Reward models are crucial for aligning AI models with human instructions, typically trained using Bradley-Terry or Regression styles. However, it is unclear which approach performs better when adequately matched for data. This gap is due to the lack of compatible datasets. To address this issue, the authors release preference annotations (Bradley-Terry) and ratings (Regression) in the HelpSteer2 dataset, accompanied by human-written justifications. The study conducts a head-to-head comparison of Bradley-Terry and Regression models when adequately matched for data. Based on these insights, a novel approach combining both styles is proposed. A Llama-3.1-70B-Instruct model tuned with this approach achieves a score of 94.1 on RewardBench, surpassing over 140 reward models as of October 2024. This approach can be used with the REINFORCE algorithm (RLHF) to align an Instruct model and reach a score of 85.0 on Arena Hard, currently the top-performing model.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper is about how AI models can follow instructions better. It’s like trying to understand what someone means when they say “make me a sandwich”. The problem is that there are two main ways to train these models, but we don’t know which one works best. To fix this, the authors created new data with explanations for each item. They compared the two methods and found a way to combine them that works really well. This new method helps AI models understand instructions better and even beats many other models in tests.

Keywords

» Artificial intelligence  » Llama  » Regression  » Rlhf