Loading Now

Summary of An Information-theoretic Analysis Of Thompson Sampling For Logistic Bandits, by Amaury Gouverneur et al.


An Information-Theoretic Analysis of Thompson Sampling for Logistic Bandits

by Amaury Gouverneur, Borja Rodríguez-Gálvez, Tobias J. Oechtering, Mikael Skoglund

First submitted to arxiv on: 3 Dec 2024

Categories

  • Main: Machine Learning (stat.ML)
  • Secondary: Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The medium-difficulty summary: This paper studies the performance of Thompson Sampling algorithm for logistic bandit problems. The algorithm receives binary rewards with probabilities determined by a logistic function and analyzes the information ratio, which quantifies the trade-off between immediate regret and gained information about optimal actions. The authors improve upon previous results by establishing that the information ratio is bounded by 9/2dalpha^-2, where alpha measures the alignment between action and parameter spaces. They then derive a bound on Bayesian expected regret of order O(d/alphasqrt(Tlog(beta*T/d))). This is the first regret bound for logistic bandits that depends only logarithmically on beta while being independent of the number of actions.
Low GrooveSquid.com (original content) Low Difficulty Summary
The low-difficulty summary: Imagine an algorithm called Thompson Sampling that helps make decisions based on rewards. In this case, the rewards are either good or bad, and the algorithm uses a special formula to figure out what to do next. The researchers looked at how well this algorithm works when it’s trying to find the best action in a situation where the possible actions have some connection to the underlying problem. They found that the algorithm does better than expected, especially when there are many possible actions and the rewards are very close together. This is important because it helps us understand how to make good decisions in situations like this.

Keywords

» Artificial intelligence  » Alignment