Loading Now

Summary of Critique-out-loud Reward Models, by Zachary Ankner et al.


Critique-out-Loud Reward Models

by Zachary Ankner, Mansheej Paul, Brandon Cui, Jonathan D. Chang, Prithviraj Ammanabrolu

First submitted to arxiv on: 21 Aug 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: None

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The paper introduces Critique-out-Loud (CLoud) reward models, which leverage large language model (LLM) generation capabilities to reason explicitly about response quality. Traditionally, reward models are trained to directly predict preference scores without utilizing the LLM’s generation capabilities, limiting their abilities to reason implicitly about response quality. CLoud reward models operate by generating a natural language critique of the assistant’s response, which is then used to predict a scalar reward for response quality. The authors demonstrate the success of CLoud reward models for both Llama-3-8B and 70B base models, achieving improved pairwise preference classification accuracy on RewardBench and Pareto improvements in win rate on ArenaHard when used as scoring models. Furthermore, the paper explores how to exploit the dynamic inference compute capabilities of CLoud reward models by performing self-consistency decoding for reward prediction.
Low GrooveSquid.com (original content) Low Difficulty Summary
The paper is about a new way to make computer programs learn from human feedback. Usually, these programs are trained to directly say whether they like or dislike something, without using their language skills to generate explanations. This limits what they can do. The researchers created a new type of program that generates an explanation for why it likes or dislikes something, and then uses this explanation to decide if it’s good or bad. They tested this new approach with two different types of programs and found that it worked better than the old way. This could lead to better language assistants in the future.

Keywords

» Artificial intelligence  » Classification  » Inference  » Large language model  » Llama