Summary of Listwise Reward Estimation For Offline Preference-based Reinforcement Learning, by Heewoong Choi et al.
Listwise Reward Estimation for Offline Preference-based Reinforcement Learning
by Heewoong Choi, Sangwon Jung, Hongjoon Ahn, Taesup Moon
First submitted to arxiv on: 8 Aug 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary Reinforcement Learning (RL) faces the challenge of designing precise reward functions that align with human intent. To address this issue, Preference-based RL (PbRL) was introduced, learning reward models from human feedback. However, existing PbRL methods overlook second-order preference information, indicating relative strength of preference. This paper proposes Listwise Reward Estimation (LiRE), a novel offline PbRL approach that leverages second-order preferences by constructing a Ranked List of Trajectories (RLT). LiRE uses ternary feedback to efficiently build the RLT. A new offline PbRL dataset is proposed to validate LiRE’s effectiveness, demonstrating its superiority over state-of-the-art baselines with modest feedback budgets and robustness to feedback noise. Code is available at https://github.com/chwoong/LiRE. The proposed approach outperforms existing methods in various scenarios, including robotic grasping, traffic signal control, and video game playing. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Imagine teaching a computer how to make good decisions by giving it feedback. Right now, designing the rules for this feedback is tricky because we want them to match what humans would do. A new way of doing this called Preference-based RL (PbRL) has been developed, but it’s not perfect. This paper proposes an improvement called LiRE that takes into account how strong our preferences are. It uses a special list to understand which actions are better than others. To test this idea, the researchers created a new dataset with real-life scenarios like robots grasping objects or controlling traffic lights. The results show that LiRE is better than other methods and can work well even when we don’t have much feedback. |
Keywords
» Artificial intelligence » Reinforcement learning