Summary of Efficient, Low-regret, Online Reinforcement Learning For Linear Mdps, by Philips George John et al.
Efficient, Low-Regret, Online Reinforcement Learning for Linear MDPs
by Philips George John, Arnab Bhattacharyya, Silviu Maniu, Dimitrios Myrisiotis, Zhenan Wu
First submitted to arxiv on: 16 Nov 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Data Structures and Algorithms (cs.DS)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The authors propose two modifications to the LSVI-UCB algorithm, a polynomial-time reinforcement learning method for linear Markov decision processes. The original algorithm provides theoretical guarantees on its running time and regret, but has high space usage due to a linear regression step. The modified algorithms alternate between periods of learning and not-learning to reduce space and time usage while maintaining sublinear regret. Experimental results on synthetic data and real-world benchmarks demonstrate that the modifications achieve low space usage and running time without significantly sacrificing regret. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary The paper improves an existing reinforcement learning algorithm, LSVI-UCB, which is used in situations where rewards are given for certain actions. The original algorithm works well but uses a lot of space, so the authors tried to make it more efficient. They came up with two new versions that switch between doing and not doing some calculations to save memory and time while still getting good results. The new algorithms were tested on made-up data and real-life situations, and they worked as expected. |
Keywords
» Artificial intelligence » Linear regression » Reinforcement learning » Synthetic data