Loading Now

Summary of Robot Policy Learning with Temporal Optimal Transport Reward, by Yuwei Fu et al.


Robot Policy Learning with Temporal Optimal Transport Reward

by Yuwei Fu, Haichao Zhang, Di Wu, Wei Xu, Benoit Boulet

First submitted to arxiv on: 29 Oct 2024

Categories

  • Main: Artificial Intelligence (cs.AI)
  • Secondary: Machine Learning (cs.LG); Robotics (cs.RO)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
In this paper, researchers tackle the challenging problem of reward specification in Reinforcement Learning by leveraging expert video demonstrations. They propose Temporal Optimal Transport (TemporalOT) as a novel approach to generate proxy rewards that account for temporal order information, unlike previous methods which overlook this crucial aspect. The authors demonstrate the effectiveness of their method on Meta-world benchmark tasks, showcasing improved policy learning accuracy.
Low GrooveSquid.com (original content) Low Difficulty Summary
Reward specification in Reinforcement Learning is difficult without tedious hand engineering. One way to tackle this issue is by using expert video demonstrations to learn robot policies. Recent work has shown that reward labeling via Optimal Transport (OT) can generate a proxy reward measuring alignment between the robot trajectory and expert demos. However, OT rewards are invariant to temporal order, which adds noise to the signal. This paper introduces TemporalOT to incorporate temporal order information for a more accurate proxy reward. The authors test their method on Meta-world tasks and provide code at this URL.

Keywords

» Artificial intelligence  » Alignment  » Reinforcement learning