Summary of Stabilizing Extreme Q-learning by Maclaurin Expansion, By Motoki Omura et al.

Stabilizing Extreme Q-learning by Maclaurin Expansion

by Motoki Omura, Takayuki Osa, Yusuke Mukuta, Tatsuya Harada

First submitted to arxiv on: 7 Jun 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The proposed Maclaurin Expanded Extreme Q-learning (ME-XQL) is an enhancement to the original Extreme Q-learning (XQL) method for offline reinforcement learning. XQL uses a loss function based on the assumption that Bellman error follows a Gumbel distribution, enabling it to model the soft optimal value function in an in-sample manner. ME-XQL applies Maclaurin expansion to the loss function in XQL to enhance stability against large errors. This approach adjusts the modeled value function between the value function under the behavior policy and the soft optimal value function, achieving a trade-off between stability and optimality depending on the order of expansion. ME-XQL also enables adjustment of the error distribution assumption from a normal distribution to a Gumbel distribution. The proposed method is evaluated in online RL tasks from DM Control, where XQL was previously unstable, and offline RL tasks from D4RL.
Low	GrooveSquid.com (original content)	Low Difficulty Summary Maclaurin Expanded Extreme Q-learning (ME-XQL) is a new way to make reinforcement learning more stable. Originally, Extreme Q-learning (XQL) had some problems with stability due to its loss function. ME-XQL fixes this by adding more complexity to the loss function using Maclaurin expansion. This makes the learning process less prone to errors and improves overall performance. The method is tested on two types of tasks: online RL from DM Control, where it works better than XQL, and offline RL from D4RL, where it gets even better results.

Keywords

* Artificial intelligence * Loss function * Reinforcement learning

Stabilizing Extreme Q-learning by Maclaurin Expansion

by Motoki Omura, Takayuki Osa, Yusuke Mukuta, Tatsuya Harada

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of From Link Prediction to Forecasting: Addressing Challenges in Batch-based Temporal Graph Learning, by Moritz Lampert et al.

Summary of Concept Drift Detection Using Ensemble Of Integrally Private Models, by Ayush K. Varshney et al.

Related Posts