Summary of Stabilizing Extreme Q-learning by Maclaurin Expansion, By Motoki Omura et al.
Stabilizing Extreme Q-learning by Maclaurin Expansion
by Motoki Omura, Takayuki Osa, Yusuke Mukuta, Tatsuya Harada
First submitted to arxiv on: 7 Jun 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The proposed Maclaurin Expanded Extreme Q-learning (ME-XQL) is an enhancement to the original Extreme Q-learning (XQL) method for offline reinforcement learning. XQL uses a loss function based on the assumption that Bellman error follows a Gumbel distribution, enabling it to model the soft optimal value function in an in-sample manner. ME-XQL applies Maclaurin expansion to the loss function in XQL to enhance stability against large errors. This approach adjusts the modeled value function between the value function under the behavior policy and the soft optimal value function, achieving a trade-off between stability and optimality depending on the order of expansion. ME-XQL also enables adjustment of the error distribution assumption from a normal distribution to a Gumbel distribution. The proposed method is evaluated in online RL tasks from DM Control, where XQL was previously unstable, and offline RL tasks from D4RL. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Maclaurin Expanded Extreme Q-learning (ME-XQL) is a new way to make reinforcement learning more stable. Originally, Extreme Q-learning (XQL) had some problems with stability due to its loss function. ME-XQL fixes this by adding more complexity to the loss function using Maclaurin expansion. This makes the learning process less prone to errors and improves overall performance. The method is tested on two types of tasks: online RL from DM Control, where it works better than XQL, and offline RL from D4RL, where it gets even better results. |
Keywords
* Artificial intelligence * Loss function * Reinforcement learning