Summary of Learning Reward and Policy Jointly From Demonstration and Preference Improves Alignment, by Chenliang Li et al.
Learning Reward and Policy Jointly from Demonstration and Preference Improves Alignment
by Chenliang Li, Siliang Zeng, Zeyi Liao, Jiaxiang Li, Dongyeop Kang, Alfredo Garcia, Mingyi Hong
First submitted to arxiv on: 11 Jun 2024
Categories
- Main: Artificial Intelligence (cs.AI)
- Secondary: Human-Computer Interaction (cs.HC); Robotics (cs.RO)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The proposed Alignment with Integrated Human Feedback (AIHF) approach integrates human preference and demonstration to train reward models and policies in a single stage. This addresses issues with popular approaches like RLHF, which break down alignment into separate stages, resulting in underutilization of data and distribution mismatch. AIHF admits efficient algorithms that can reduce to or leverage existing alignment pipelines, such as RLHF and Directly Policy Optimization (DPO). The approach is demonstrated through extensive experiments on language models and robotic control problems, showing significant performance improvements over existing methods when high-quality preference data is limited. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary AIHF is a new way to align human preferences and values with AI. It combines two things: what humans like and how they behave. This makes it better than other approaches that do these things separately. The result is more accurate alignment, which is important for building good foundation models and embodied AI. The method is tested on language models and robotic control problems, showing it works well even with limited data. |
Keywords
» Artificial intelligence » Alignment » Optimization » Rlhf