Loading Now

Summary of Bots: Batch Bayesian Optimization Of Extended Thompson Sampling For Severely Episode-limited Rl Settings, by Karine Karine and Susan A. Murphy and Benjamin M. Marlin


BOTS: Batch Bayesian Optimization of Extended Thompson Sampling for Severely Episode-Limited RL Settings

by Karine Karine, Susan A. Murphy, Benjamin M. Marlin

First submitted to arxiv on: 30 Nov 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Artificial Intelligence (cs.AI); Machine Learning (stat.ML)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper proposes a new approach to reinforcement learning (RL) in settings where real-world trials are limited by cost or time constraints. The linear Thompson sampling bandit is extended to select actions based on a state-action utility function, which combines the expected immediate reward with an action bias term. The proposed method uses batch Bayesian optimization to learn the action bias terms and can learn optimal policies for a broader class of Markov decision processes (MDPs) than standard Thompson sampling. In simulations, this approach significantly outperforms standard methods in terms of total return while requiring fewer episodes.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper helps us learn better by finding ways to make decisions when we only have limited time or money to test them. It’s like trying to find the best way to help people get healthier, but we can’t do as many tests as we’d like. The new approach combines two ideas: how good an action will be right away and a special adjustment for each action. This helps us make decisions that work better and use fewer tests. The results show that this new method is much better at making the best choices than other ways of learning, and it only needs to test things a few times.

Keywords

» Artificial intelligence  » Optimization  » Reinforcement learning