Summary of Moma: Model-based Mirror Ascent For Offline Reinforcement Learning, by Mao Hong et al.
MoMA: Model-based Mirror Ascent for Offline Reinforcement Learning
by Mao Hong, Zhiyue Zhang, Yue Wu, Yanxun Xu
First submitted to arxiv on: 21 Jan 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Statistics Theory (math.ST); Methodology (stat.ME); Machine Learning (stat.ML)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary Model-based offline reinforcement learning methods have achieved state-of-the-art performance in many decision-making problems due to their sample efficiency and generalizability. However, existing approaches either focus on theoretical studies without developing practical algorithms or rely on a restricted parametric policy space, limiting the full potential of model-based methods. To address this limitation, we introduce MoMA, a model-based mirror ascent algorithm with general function approximations under partial coverage of offline data. MoMA distinguishes itself by employing an unrestricted policy class and leveraging confidence sets of transition models for value function estimation. We establish theoretical guarantees by proving an upper bound on the suboptimality of the returned policy. A practically implementable, approximate version is also provided. Our numerical studies demonstrate the effectiveness of MoMA. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Model-based offline reinforcement learning methods are used to make decisions in many situations. They are good at making predictions and are often used when there isn’t much data available. However, current approaches have some limitations. Some focus on theory without providing practical algorithms, while others use a restricted set of policy options. To overcome these limitations, we developed MoMA, an algorithm that can use any kind of policy it wants. This is different from other methods that only allow certain types of policies. We proved that MoMA works well and showed its effectiveness through some examples. |
Keywords
* Artificial intelligence * Reinforcement learning