Summary of Nearly Minimax Optimal Regret For Learning Linear Mixture Stochastic Shortest Path, by Qiwei Di et al.
Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic Shortest Path
by Qiwei Di, Jiafan He, Dongruo Zhou, Quanquan Gu
First submitted to arxiv on: 14 Feb 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Machine Learning (stat.ML)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The proposed algorithm eliminates restrictive assumptions on cost functions and policy lengths for Stochastic Shortest Path (SSP) problems with linear mixture transition kernels. The algorithm uses extended value iteration with a fine-grained variance-aware confidence set, estimating variance recursively from high-order moments. This results in an (dB_*) regret bound, matching the lower bound of Min et al. (2022). The algorithm achieves nearly minimax optimality for linear mixture SSPs. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper solves a challenging problem where an agent tries to reach a goal state while minimizing cost. Existing methods have restrictions on how much cost can be or how long it takes to get there. This new algorithm removes these limitations, using a special way of calculating confidence intervals. The algorithm is very good at finding the best policy and is nearly as good as possible. |