Loading Now

Summary of Non-adversarial Inverse Reinforcement Learning Via Successor Feature Matching, by Arnav Kumar Jain et al.


Non-Adversarial Inverse Reinforcement Learning via Successor Feature Matching

by Arnav Kumar Jain, Harley Wiltzer, Jesse Farebrother, Irina Rish, Glen Berseth, Sanjiban Choudhury

First submitted to arxiv on: 11 Nov 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
In inverse reinforcement learning, an agent seeks to replicate expert demonstrations through interactions with the environment by optimizing the reward function. Traditionally, IRL is treated as an adversarial game between an adversary searching over reward models and a learner optimizing the reward through repeated RL procedures. This approach is computationally expensive and difficult to stabilize. Our novel approach, direct policy optimization, exploits a linear factorization of the return as the inner product of successor features and a reward vector, designing an IRL algorithm by policy gradient descent on the gap between the learner and expert features. Our non-adversarial method does not require learning a reward function and can be solved seamlessly with existing actor-critic RL algorithms. It works in state-only settings without expert action labels, a setting which behavior cloning (BC) cannot solve. Empirical results demonstrate that our method learns from as few as a single expert demonstration and achieves improved performance on various control tasks.
Low GrooveSquid.com (original content) Low Difficulty Summary
In this research paper, scientists created a new way to learn from experts by directly optimizing the policy of an agent rather than trying to find a reward function. This approach is better because it’s faster and more reliable. The algorithm doesn’t need to know what the expert did, just what they achieved. The results show that this method can learn quickly from just one example and do well on different tasks.

Keywords

» Artificial intelligence  » Gradient descent  » Optimization  » Reinforcement learning