Summary of Efficient, Low-regret, Online Reinforcement Learning For Linear Mdps, by Philips George John et al.

Efficient, Low-Regret, Online Reinforcement Learning for Linear MDPs

by Philips George John, Arnab Bhattacharyya, Silviu Maniu, Dimitrios Myrisiotis, Zhenan Wu

First submitted to arxiv on: 16 Nov 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The authors propose two modifications to the LSVI-UCB algorithm, a polynomial-time reinforcement learning method for linear Markov decision processes. The original algorithm provides theoretical guarantees on its running time and regret, but has high space usage due to a linear regression step. The modified algorithms alternate between periods of learning and not-learning to reduce space and time usage while maintaining sublinear regret. Experimental results on synthetic data and real-world benchmarks demonstrate that the modifications achieve low space usage and running time without significantly sacrificing regret.
Low	GrooveSquid.com (original content)	Low Difficulty Summary The paper improves an existing reinforcement learning algorithm, LSVI-UCB, which is used in situations where rewards are given for certain actions. The original algorithm works well but uses a lot of space, so the authors tried to make it more efficient. They came up with two new versions that switch between doing and not doing some calculations to save memory and time while still getting good results. The new algorithms were tested on made-up data and real-life situations, and they worked as expected.

Keywords

* Artificial intelligence * Linear regression * Reinforcement learning * Synthetic data

Efficient, Low-Regret, Online Reinforcement Learning for Linear MDPs

by Philips George John, Arnab Bhattacharyya, Silviu Maniu, Dimitrios Myrisiotis, Zhenan Wu

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Adaptive Learning Of Design Strategies Over Non-hierarchical Multi-fidelity Models Via Policy Alignment, by Akash Agrawal (1) et al.

Summary of Different Horses For Different Courses: Comparing Bias Mitigation Algorithms in Ml, by Prakhar Ganesh et al.

Related Posts