Summary of Adaptive Horizon Actor-critic For Policy Learning in Contact-rich Differentiable Simulation, by Ignat Georgiev et al.
Adaptive Horizon Actor-Critic for Policy Learning in Contact-Rich Differentiable Simulation
by Ignat Georgiev, Krishnan Srinivasan, Jie Xu, Eric Heiden, Animesh Garg
First submitted to arxiv on: 28 May 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper explores Model-Free Reinforcement Learning (MFRL) and its limitations in continuous control tasks. MFRL has achieved success using the policy gradient theorem, but is hindered by high gradient variance from zeroth-order estimation. In contrast, First-Order Model-Based Reinforcement Learning (FO-MBRL) methods employ differentiable simulation for reduced variance gradients, yet struggle with sampling error in stiff dynamics scenarios like physical contact. The paper investigates this source of error and proposes Adaptive Horizon Actor-Critic (AHAC), an FO-MBRL algorithm that adapts the model-based horizon to avoid stiff dynamics. Experimental results show AHAC outperforming MFRL baselines, achieving 40% more reward across locomotion tasks, while efficiently scaling to high-dimensional control environments with improved wall-clock-time efficiency. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper looks at two ways to learn in situations where we need to make decisions based on uncertain outcomes. The first approach is called Model-Free Reinforcement Learning (MFRL) and it’s good for some things, but has a problem: it can be very slow and not very accurate because it needs to try lots of different actions. The second approach is First-Order Model-Based Reinforcement Learning (FO-MBRL) and it’s better at estimating the right action, but sometimes gets stuck in situations where the rules are complex or change suddenly. This paper tries to solve this problem by creating a new algorithm called Adaptive Horizon Actor-Critic (AHAC) that adapts to these changing situations. The results show that AHAC is much better than MFRL and can learn more quickly. |
Keywords
» Artificial intelligence » Reinforcement learning