Loading Now

Summary of Dissecting Deep Rl with High Update Ratios: Combatting Value Divergence, by Marcel Hussing et al.


Dissecting Deep RL with High Update Ratios: Combatting Value Divergence

by Marcel Hussing, Claas Voelcker, Igor Gilitschenski, Amir-massoud Farahmand, Eric Eaton

First submitted to arxiv on: 9 Mar 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper investigates the primacy bias in deep reinforcement learning, where agents overfit early interactions and downplay later experience, impeding their ability to learn. The authors identify value function divergence as a fundamental challenge leading to this issue. They find that overinflated Q-values are not only present on out-of-distribution but also in-distribution data, linked to overestimation on unseen action prediction propelled by optimizer momentum. To combat this, the authors propose a simple unit-ball normalization technique, which enables learning under large update ratios. The method is tested on the dm_control suite and achieves strong performance on challenging dog tasks, comparable to model-based approaches. This research questions the prior explanation for sub-optimal learning due to overfitting early data.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper helps solve a problem in artificial intelligence called the primacy bias. It’s when machines learn too much from their first experiences and forget to learn from later ones. The researchers found that this happens because the machine’s “values” (like rewards or penalties) get stuck. They developed a new way to fix this by normalizing the values, which allows the machine to learn better even when it has many updates without seeing new data. This method works well on challenging tasks and is comparable to more complex approaches.

Keywords

* Artificial intelligence  * Overfitting  * Reinforcement learning