Summary of Robot Policy Learning with Temporal Optimal Transport Reward, by Yuwei Fu et al.
Robot Policy Learning with Temporal Optimal Transport Reward
by Yuwei Fu, Haichao Zhang, Di Wu, Wei Xu, Benoit Boulet
First submitted to arxiv on: 29 Oct 2024
Categories
- Main: Artificial Intelligence (cs.AI)
- Secondary: Machine Learning (cs.LG); Robotics (cs.RO)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary In this paper, researchers tackle the challenging problem of reward specification in Reinforcement Learning by leveraging expert video demonstrations. They propose Temporal Optimal Transport (TemporalOT) as a novel approach to generate proxy rewards that account for temporal order information, unlike previous methods which overlook this crucial aspect. The authors demonstrate the effectiveness of their method on Meta-world benchmark tasks, showcasing improved policy learning accuracy. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Reward specification in Reinforcement Learning is difficult without tedious hand engineering. One way to tackle this issue is by using expert video demonstrations to learn robot policies. Recent work has shown that reward labeling via Optimal Transport (OT) can generate a proxy reward measuring alignment between the robot trajectory and expert demos. However, OT rewards are invariant to temporal order, which adds noise to the signal. This paper introduces TemporalOT to incorporate temporal order information for a more accurate proxy reward. The authors test their method on Meta-world tasks and provide code at this URL. |
Keywords
» Artificial intelligence » Alignment » Reinforcement learning