Summary of Bots: Batch Bayesian Optimization Of Extended Thompson Sampling For Severely Episode-limited Rl Settings, by Karine Karine and Susan A. Murphy and Benjamin M. Marlin

BOTS: Batch Bayesian Optimization of Extended Thompson Sampling for Severely Episode-Limited RL Settings

by Karine Karine, Susan A. Murphy, Benjamin M. Marlin

First submitted to arxiv on: 30 Nov 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary This paper proposes a new approach to reinforcement learning (RL) in settings where real-world trials are limited by cost or time constraints. The linear Thompson sampling bandit is extended to select actions based on a state-action utility function, which combines the expected immediate reward with an action bias term. The proposed method uses batch Bayesian optimization to learn the action bias terms and can learn optimal policies for a broader class of Markov decision processes (MDPs) than standard Thompson sampling. In simulations, this approach significantly outperforms standard methods in terms of total return while requiring fewer episodes.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This paper helps us learn better by finding ways to make decisions when we only have limited time or money to test them. It’s like trying to find the best way to help people get healthier, but we can’t do as many tests as we’d like. The new approach combines two ideas: how good an action will be right away and a special adjustment for each action. This helps us make decisions that work better and use fewer tests. The results show that this new method is much better at making the best choices than other ways of learning, and it only needs to test things a few times.

Keywords

» Artificial intelligence » Optimization » Reinforcement learning

BOTS: Batch Bayesian Optimization of Extended Thompson Sampling for Severely Episode-Limited RL Settings

by Karine Karine, Susan A. Murphy, Benjamin M. Marlin

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Is Oracle Pruning the True Oracle?, by Sicheng Feng et al.

Summary of One Model For One Graph: a New Perspective For Pretraining with Cross-domain Graphs, by Jingzhe Liu et al.

Related Posts