Loading Now

Summary of Learning Reward and Policy Jointly From Demonstration and Preference Improves Alignment, by Chenliang Li et al.


Learning Reward and Policy Jointly from Demonstration and Preference Improves Alignment

by Chenliang Li, Siliang Zeng, Zeyi Liao, Jiaxiang Li, Dongyeop Kang, Alfredo Garcia, Mingyi Hong

First submitted to arxiv on: 11 Jun 2024

Categories

  • Main: Artificial Intelligence (cs.AI)
  • Secondary: Human-Computer Interaction (cs.HC); Robotics (cs.RO)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The proposed Alignment with Integrated Human Feedback (AIHF) approach integrates human preference and demonstration to train reward models and policies in a single stage. This addresses issues with popular approaches like RLHF, which break down alignment into separate stages, resulting in underutilization of data and distribution mismatch. AIHF admits efficient algorithms that can reduce to or leverage existing alignment pipelines, such as RLHF and Directly Policy Optimization (DPO). The approach is demonstrated through extensive experiments on language models and robotic control problems, showing significant performance improvements over existing methods when high-quality preference data is limited.
Low GrooveSquid.com (original content) Low Difficulty Summary
AIHF is a new way to align human preferences and values with AI. It combines two things: what humans like and how they behave. This makes it better than other approaches that do these things separately. The result is more accurate alignment, which is important for building good foundation models and embodied AI. The method is tested on language models and robotic control problems, showing it works well even with limited data.

Keywords

» Artificial intelligence  » Alignment  » Optimization  » Rlhf