Summary of Regularized Q-learning with Linear Function Approximation, by Jiachen Xi et al.
Regularized Q-Learning with Linear Function Approximation
by Jiachen Xi, Alfredo Garcia, Petar Momcilovic
First submitted to arxiv on: 26 Jan 2024
Categories
- Main: Artificial Intelligence (cs.AI)
- Secondary: None
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary Regularized Markov Decision Processes (RMDPs) model sequential decision making under uncertainty when the decision maker has limited information processing capacity and/or aversion to model ambiguity. However, the convergence properties of learning algorithms for RMDPs, such as soft Q-learning, are not well understood due to the complex composition of the regularized Bellman operator and a projection onto basis vectors. This paper presents a bi-level optimization formulation of regularized Q-learning with linear functional approximation, which motivates a single-loop algorithm with finite-time convergence guarantees. The proposed algorithm operates on two time-scales: slow updates for projecting state-action values and faster updates for solving Bellman’s recursive optimality equation. Under certain assumptions, the algorithm converges to a stationary point in the presence of Markovian noise, and it provides a performance guarantee for derived policies. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper is about how people make decisions when they don’t have all the information or are unsure about what will happen. It’s like trying to navigate through a maze without a map. The researchers developed a new way to solve this problem using something called regularized Markov Decision Processes (RMDPs). They came up with an algorithm that can learn from experience and make good decisions, even when there is uncertainty. This algorithm works by making slow updates to adjust its understanding of the world and faster updates to make better decisions. The researchers showed that this algorithm can actually work well in real-world situations. |
Keywords
* Artificial intelligence * Optimization