Summary of Ompo: a Unified Framework For Rl Under Policy and Dynamics Shifts, by Yu Luo et al.

OMPO: A Unified Framework for RL under Policy and Dynamics Shifts

by Yu Luo, Tianying Ji, Fuchun Sun, Jianwei Zhang, Huazhe Xu, Xianyuan Zhan

First submitted to arxiv on: 29 May 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary This paper addresses a fundamental challenge in reinforcement learning (RL): how to train policies using environment interaction data collected from varying policies or dynamics. Existing works often overlook distribution discrepancies induced by policy or dynamics shifts, leading to suboptimal policy performances and high learning variances. The authors propose a unified strategy called transition occupancy matching, which involves introducing a surrogate policy learning objective that considers the transition occupancy discrepancies and reformulates it as a tractable min-max optimization problem. This approach is implemented through the Occupancy-Matching Policy Optimization (OMPO) method, which features an actor-critic structure with a distribution discriminator and a small-size local buffer. The authors conduct extensive experiments on various environments, including OpenAI Gym, Meta-World, and Panda Robots, showcasing OMPO’s effectiveness in policy shifts under stationary and nonstationary dynamics, as well as domain adaptation. Notably, OMPO outperforms specialized baselines from different categories in all settings.
Low	GrooveSquid.com (original content)	Low Difficulty Summary Imagine you’re trying to teach a robot new skills by giving it lots of different tasks to do. The problem is that the robot might learn to do these tasks in different ways depending on what task it’s doing, or how it’s being controlled. This paper figures out a way to make the robot learn more quickly and accurately by paying attention to how the tasks change over time. They call this method “transition occupancy matching” and it helps the robot learn new skills even when the tasks are changing all the time.

Keywords

» Artificial intelligence » Attention » Domain adaptation » Optimization » Reinforcement learning

OMPO: A Unified Framework for RL under Policy and Dynamics Shifts

by Yu Luo, Tianying Ji, Fuchun Sun, Jianwei Zhang, Huazhe Xu, Xianyuan Zhan

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Ciliagraph: Enabling Expression-enhanced Hyper-dimensional Computation in Ultra-lightweight and One-shot Graph Classification on Edge, by Yuxi Han and Jihe Wang and Danghui Wang

Summary of Vulnerable Road User Detection and Safety Enhancement: a Comprehensive Survey, by Renato M. Silva et al.

Related Posts