Loading Now

Summary of Thompson Sampling in Partially Observable Contextual Bandits, by Hongju Park and Mohamad Kazem Shirani Faradonbeh


Thompson Sampling in Partially Observable Contextual Bandits

by Hongju Park, Mohamad Kazem Shirani Faradonbeh

First submitted to arxiv on: 15 Feb 2024

Categories

  • Main: Machine Learning (stat.ML)
  • Secondary: Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
In this paper, researchers explore a classical framework for decision-making under uncertainty called contextual bandits. The goal is to learn which options (arms) are most rewarding based on context, while also learning the unknown parameters of each arm by experimenting with it. To study this problem, they typically consider perfect observations, but the setting where observations are noisy and linearly related to unobserved contexts has been overlooked despite being more general and versatile. The paper presents a new approach to selecting optimal arms based on noisy observations, which successfully balances exploration and exploitation. Theoretical analysis shows that Thompson sampling achieves poly-logarithmic regret bounds, square-root consistency of parameter estimation, and scaling of regret with dimensions and arm numbers. Numerical experiments corroborate the efficacy of Thompson sampling.
Low GrooveSquid.com (original content) Low Difficulty Summary
In this paper, scientists investigate how to make good decisions when there’s uncertainty involved. They’re looking at something called contextual bandits, where you need to figure out which options are best based on some extra information, while also learning what makes each option work. Most people have studied this problem assuming they know all the details about the situation, but what if that information is missing or noisy? The researchers come up with a new way to make decisions in these situations and show that it works really well. They use math to prove their method is good and test it with real data.

Keywords

* Artificial intelligence