Summary of Dissecting Deep Rl with High Update Ratios: Combatting Value Divergence, by Marcel Hussing et al.

Dissecting Deep RL with High Update Ratios: Combatting Value Divergence

by Marcel Hussing, Claas Voelcker, Igor Gilitschenski, Amir-massoud Farahmand, Eric Eaton

First submitted to arxiv on: 9 Mar 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary This paper investigates the primacy bias in deep reinforcement learning, where agents overfit early interactions and downplay later experience, impeding their ability to learn. The authors identify value function divergence as a fundamental challenge leading to this issue. They find that overinflated Q-values are not only present on out-of-distribution but also in-distribution data, linked to overestimation on unseen action prediction propelled by optimizer momentum. To combat this, the authors propose a simple unit-ball normalization technique, which enables learning under large update ratios. The method is tested on the dm_control suite and achieves strong performance on challenging dog tasks, comparable to model-based approaches. This research questions the prior explanation for sub-optimal learning due to overfitting early data.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This paper helps solve a problem in artificial intelligence called the primacy bias. It’s when machines learn too much from their first experiences and forget to learn from later ones. The researchers found that this happens because the machine’s “values” (like rewards or penalties) get stuck. They developed a new way to fix this by normalizing the values, which allows the machine to learn better even when it has many updates without seeing new data. This method works well on challenging tasks and is comparable to more complex approaches.

Keywords

* Artificial intelligence * Overfitting * Reinforcement learning

Dissecting Deep RL with High Update Ratios: Combatting Value Divergence

by Marcel Hussing, Claas Voelcker, Igor Gilitschenski, Amir-massoud Farahmand, Eric Eaton

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Tlasdi: Thermodynamics-informed Latent Space Dynamics Identification, by Jun Sur Richard Park et al.

Summary of Semi-supervised Multimodal Multi-instance Learning For Aortic Stenosis Diagnosis, by Zhe Huang et al.

Related Posts