Summary of Revalued: Regularised Ensemble Value-decomposition For Factorisable Markov Decision Processes, by David Ireland and Giovanni Montana
REValueD: Regularised Ensemble Value-Decomposition for Factorisable Markov Decision Processes
by David Ireland, Giovanni Montana
First submitted to arxiv on: 16 Jan 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This research paper proposes a novel approach to discrete-action reinforcement learning, addressing the challenge of high-dimensional action spaces. The authors leverage value-decomposition from multi-agent reinforcement learning to develop an ensemble of critics that mitigates target variance. They also introduce a regularization loss to counteract the effects of exploratory actions on optimal actions in other dimensions. The proposed algorithm, REValueD, outperforms existing methods on discretised versions of DeepMind Control Suite tasks, particularly in humanoid and dog tasks. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary In this paper, researchers find ways to improve reinforcement learning algorithms that struggle with lots of possible actions. They use a technique called value-decomposition, which helps reduce mistakes. However, they also notice that it can make things worse by making the algorithm choose random actions more often. To fix this, they create an “ensemble” of critics that works together to make better decisions. They also add a special trick to prevent the algorithm from being too influenced by one action in a group of similar actions. The new algorithm is called REValueD and it does very well on some challenging tasks. |
Keywords
* Artificial intelligence * Regularization * Reinforcement learning