Loading Now

Summary of Stop Regressing: Training Value Functions Via Classification For Scalable Deep Rl, by Jesse Farebrother et al.


Stop Regressing: Training Value Functions via Classification for Scalable Deep RL

by Jesse Farebrother, Jordi Orbay, Quan Vuong, Adrien Ali Taïga, Yevgen Chebotar, Ted Xiao, Alex Irpan, Sergey Levine, Pablo Samuel Castro, Aleksandra Faust, Aviral Kumar, Rishabh Agarwal

First submitted to arxiv on: 6 Mar 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The paper explores the potential benefits of using classification instead of regression for training value functions in deep reinforcement learning (RL). The authors observe that while supervised learning methods have scaled reliably to massive networks by leveraging a cross-entropy classification loss, this has not been the case for RL. They demonstrate that training value functions with categorical cross-entropy significantly improves performance and scalability in various domains, including Atari 2600 games, robotic manipulation, Chess, and language-agent Wordle tasks. The results show state-of-the-art performances on these domains. Analysis reveals that the benefits of categorical cross-entropy primarily stem from its ability to mitigate issues inherent to value-based RL, such as noisy targets and non-stationarity.
Low GrooveSquid.com (original content) Low Difficulty Summary
The paper looks at how we can make deep reinforcement learning (RL) work better by using a different way to train value functions. Right now, most RL methods use regression, which is like trying to find the exact answer to a question. But sometimes, this doesn’t work well because the targets are noisy or change over time. The authors show that if we use classification instead of regression, it can make things better and easier to scale up to bigger networks. They tested this idea in different areas, such as playing games like Chess and Wordle, and showed that it works really well.

Keywords

* Artificial intelligence  * Classification  * Cross entropy  * Regression  * Reinforcement learning  * Supervised