Summary of Non-adversarial Inverse Reinforcement Learning Via Successor Feature Matching, by Arnav Kumar Jain et al.

Non-Adversarial Inverse Reinforcement Learning via Successor Feature Matching

by Arnav Kumar Jain, Harley Wiltzer, Jesse Farebrother, Irina Rish, Glen Berseth, Sanjiban Choudhury

First submitted to arxiv on: 11 Nov 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary In inverse reinforcement learning, an agent seeks to replicate expert demonstrations through interactions with the environment by optimizing the reward function. Traditionally, IRL is treated as an adversarial game between an adversary searching over reward models and a learner optimizing the reward through repeated RL procedures. This approach is computationally expensive and difficult to stabilize. Our novel approach, direct policy optimization, exploits a linear factorization of the return as the inner product of successor features and a reward vector, designing an IRL algorithm by policy gradient descent on the gap between the learner and expert features. Our non-adversarial method does not require learning a reward function and can be solved seamlessly with existing actor-critic RL algorithms. It works in state-only settings without expert action labels, a setting which behavior cloning (BC) cannot solve. Empirical results demonstrate that our method learns from as few as a single expert demonstration and achieves improved performance on various control tasks.
Low	GrooveSquid.com (original content)	Low Difficulty Summary In this research paper, scientists created a new way to learn from experts by directly optimizing the policy of an agent rather than trying to find a reward function. This approach is better because it’s faster and more reliable. The algorithm doesn’t need to know what the expert did, just what they achieved. The results show that this method can learn quickly from just one example and do well on different tasks.

Keywords

» Artificial intelligence » Gradient descent » Optimization » Reinforcement learning

Non-Adversarial Inverse Reinforcement Learning via Successor Feature Matching

by Arnav Kumar Jain, Harley Wiltzer, Jesse Farebrother, Irina Rish, Glen Berseth, Sanjiban Choudhury

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Llm-neo: Parameter Efficient Knowledge Distillation For Large Language Models, by Runming Yang et al.

Summary of Heterosample: Meta-path Guided Sampling For Heterogeneous Graph Representation Learning, by Ao Liu et al.

Related Posts