Summary of Robot Policy Learning with Temporal Optimal Transport Reward, by Yuwei Fu et al.

Robot Policy Learning with Temporal Optimal Transport Reward

by Yuwei Fu, Haichao Zhang, Di Wu, Wei Xu, Benoit Boulet

First submitted to arxiv on: 29 Oct 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary In this paper, researchers tackle the challenging problem of reward specification in Reinforcement Learning by leveraging expert video demonstrations. They propose Temporal Optimal Transport (TemporalOT) as a novel approach to generate proxy rewards that account for temporal order information, unlike previous methods which overlook this crucial aspect. The authors demonstrate the effectiveness of their method on Meta-world benchmark tasks, showcasing improved policy learning accuracy.
Low	GrooveSquid.com (original content)	Low Difficulty Summary Reward specification in Reinforcement Learning is difficult without tedious hand engineering. One way to tackle this issue is by using expert video demonstrations to learn robot policies. Recent work has shown that reward labeling via Optimal Transport (OT) can generate a proxy reward measuring alignment between the robot trajectory and expert demos. However, OT rewards are invariant to temporal order, which adds noise to the signal. This paper introduces TemporalOT to incorporate temporal order information for a more accurate proxy reward. The authors test their method on Meta-world tasks and provide code at this URL.

Keywords

* Artificial intelligence * Alignment * Reinforcement learning

Robot Policy Learning with Temporal Optimal Transport Reward

by Yuwei Fu, Haichao Zhang, Di Wu, Wei Xu, Benoit Boulet

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Online Mirror Descent For Tchebycheff Scalarization in Multi-objective Optimization, by Meitong Liu et al.

Summary of Efficient and Effective Weight-ensembling Mixture Of Experts For Multi-task Model Merging, by Li Shen et al.

Related Posts