Summary of Improving Reinforcement Learning From Human Feedback Using Contrastive Rewards, by Wei Shen et al.

Improving Reinforcement Learning from Human Feedback Using Contrastive Rewards

by Wei Shen, Xiaoying Zhang, Yuanshun Yao, Rui Zheng, Hongyi Guo, Yang Liu

First submitted to arxiv on: 12 Mar 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary This paper proposes an innovative approach to reinforce learning from human feedback (RLHF) for large language models (LLMs). The traditional RLHF relies heavily on accurate and informative reward models, which are prone to errors. To address this limitation, the authors introduce a penalty term called contrastive rewards. This method involves two steps: offline sampling to obtain baseline responses and calculating the contrastive reward using these baselines and Proximal Policy Optimization (PPO). The proposed approach enables LLMs to penalize reward uncertainty, improve robustness, and reduce variance in PPO. Experimental results demonstrate that contrastive rewards can significantly improve RLHF, outperforming strong baselines evaluated by both GPTs and humans.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This paper improves how large language models (LLMs) learn from human feedback. Right now, this process is fragile because it relies on good reward models. Reward models are like a set of instructions that tells the LLM what to do. But sometimes these instructions can be wrong or unclear. The authors of this paper came up with a new idea called contrastive rewards. This idea makes the LLM better at ignoring bad instructions and focusing on good ones. They tested their method and found it works really well, even better than some strong competitors.

Keywords

» Artificial intelligence » Optimization » Rlhf

Improving Reinforcement Learning from Human Feedback Using Contrastive Rewards

by Wei Shen, Xiaoying Zhang, Yuanshun Yao, Rui Zheng, Hongyi Guo, Yang Liu

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of A New Random Forest Ensemble Of Intuitionistic Fuzzy Decision Trees, by Yingtao Ren et al.

Summary of Finemath: a Fine-grained Mathematical Evaluation Benchmark For Chinese Large Language Models, by Yan Liu et al.

Related Posts