Loading Now

Summary of A Unified Linear Programming Framework For Offline Reward Learning From Human Demonstrations and Feedback, by Kihyun Kim et al.


A Unified Linear Programming Framework for Offline Reward Learning from Human Demonstrations and Feedback

by Kihyun Kim, Jiawei Zhang, Asuman Ozdaglar, Pablo A. Parrilo

First submitted to arxiv on: 20 May 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
A novel linear programming (LP) framework is introduced for offline reward learning in sequential decision-making problems. The framework, tailored for inverse reinforcement learning (IRL) and reinforcement learning from human feedback (RLHF), estimates a feasible reward set without online exploration, offering an optimality guarantee with provable sample efficiency. This approach enables aligning the reward functions with human feedback while maintaining computational tractability and sample efficiency. Compared to conventional maximum likelihood estimation (MLE), the LP framework potentially achieves better performance through analytical examples and numerical experiments.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper introduces a new way to learn rewards in decision-making problems. It uses a special type of math problem called linear programming to figure out what rewards are best. This approach doesn’t need to try things out online, which can be hard or slow. Instead, it looks at past experiences and makes sure the rewards match what people think is good or bad. This new method works well and might even do better than other ways of doing reward learning.

Keywords

» Artificial intelligence  » Likelihood  » Reinforcement learning  » Reinforcement learning from human feedback  » Rlhf