Summary of Helpsteer2-preference: Complementing Ratings with Preferences, by Zhilin Wang et al.
HelpSteer2-Preference: Complementing Ratings with Preferences
by Zhilin Wang, Alexander Bukharin, Olivier Delalleau, Daniel Egert, Gerald Shen, Jiaqi Zeng, Oleksii Kuchaiev, Yi Dong
First submitted to arxiv on: 2 Oct 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary Reward models are crucial for aligning AI models with human instructions, typically trained using Bradley-Terry or Regression styles. However, it is unclear which approach performs better when adequately matched for data. This gap is due to the lack of compatible datasets. To address this issue, the authors release preference annotations (Bradley-Terry) and ratings (Regression) in the HelpSteer2 dataset, accompanied by human-written justifications. The study conducts a head-to-head comparison of Bradley-Terry and Regression models when adequately matched for data. Based on these insights, a novel approach combining both styles is proposed. A Llama-3.1-70B-Instruct model tuned with this approach achieves a score of 94.1 on RewardBench, surpassing over 140 reward models as of October 2024. This approach can be used with the REINFORCE algorithm (RLHF) to align an Instruct model and reach a score of 85.0 on Arena Hard, currently the top-performing model. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper is about how AI models can follow instructions better. It’s like trying to understand what someone means when they say “make me a sandwich”. The problem is that there are two main ways to train these models, but we don’t know which one works best. To fix this, the authors created new data with explanations for each item. They compared the two methods and found a way to combine them that works really well. This new method helps AI models understand instructions better and even beats many other models in tests. |
Keywords
» Artificial intelligence » Llama » Regression » Rlhf