Summary of Adversarially Trained Weighted Actor-critic For Safe Offline Reinforcement Learning, by Honghao Wei et al.
Adversarially Trained Weighted Actor-Critic for Safe Offline Reinforcement Learning
by Honghao Wei, Xiyue Peng, Arnob Ghosh, Xin Liu
First submitted to arxiv on: 1 Jan 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: None
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The proposed algorithm, Weighted Safe Actor-Critic (WSAC), is designed for Safe Offline Reinforcement Learning (RL) under functional approximation. It aims to robustly optimize policies to improve upon an arbitrary reference policy with limited data coverage. WSAC employs a two-player Stackelberg game approach to optimize a refined objective function. The actor optimizes the policy against two adversarially trained value critics, which focus on scenarios where the actor’s performance is inferior to the reference policy. WSAC achieves a number of guarantees, including outperforming any reference policy while maintaining the same level of safety, achieving optimal statistical convergence rate of 1/√N to the reference policy, and guaranteeing safe policy improvement across a range of hyperparameters. Theoretical results are supported by practical experiments in several continuous control environments. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary WSAC is a new way to learn from past experiences without risking harm or failure. It’s designed to work with limited data and can even improve upon existing policies. WSAC uses a special type of game-playing approach to balance the need for improvement with the need for safety. This algorithm has several key benefits, including being able to outperform any existing policy while keeping the same level of safety. It also learns quickly and consistently improves as it’s used. |
Keywords
* Artificial intelligence * Objective function * Reinforcement learning