Loading Now

Summary of Efficient, Low-regret, Online Reinforcement Learning For Linear Mdps, by Philips George John et al.


Efficient, Low-Regret, Online Reinforcement Learning for Linear MDPs

by Philips George John, Arnab Bhattacharyya, Silviu Maniu, Dimitrios Myrisiotis, Zhenan Wu

First submitted to arxiv on: 16 Nov 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Data Structures and Algorithms (cs.DS)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The authors propose two modifications to the LSVI-UCB algorithm, a polynomial-time reinforcement learning method for linear Markov decision processes. The original algorithm provides theoretical guarantees on its running time and regret, but has high space usage due to a linear regression step. The modified algorithms alternate between periods of learning and not-learning to reduce space and time usage while maintaining sublinear regret. Experimental results on synthetic data and real-world benchmarks demonstrate that the modifications achieve low space usage and running time without significantly sacrificing regret.
Low GrooveSquid.com (original content) Low Difficulty Summary
The paper improves an existing reinforcement learning algorithm, LSVI-UCB, which is used in situations where rewards are given for certain actions. The original algorithm works well but uses a lot of space, so the authors tried to make it more efficient. They came up with two new versions that switch between doing and not doing some calculations to save memory and time while still getting good results. The new algorithms were tested on made-up data and real-life situations, and they worked as expected.

Keywords

» Artificial intelligence  » Linear regression  » Reinforcement learning  » Synthetic data