Summary of Feel-good Thompson Sampling For Contextual Dueling Bandits, by Xuheng Li et al.
Feel-Good Thompson Sampling for Contextual Dueling Bandits
by Xuheng Li, Heyang Zhao, Quanquan Gu
First submitted to arxiv on: 9 Apr 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Optimization and Control (math.OC); Machine Learning (stat.ML)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper proposes a Thompson sampling algorithm for linear contextual dueling bandits, which extends classic dueling bandits to incorporate contextual information. The proposed algorithm, named http://FGTS.CDB, leverages the independence of the two selected arms and incorporates a new exploration term tailored for dueling bandits. The algorithm achieves nearly minimax-optimal regret, with a bound of (dT), where d is the model dimension and T is the time horizon. Experimental results on synthetic data show that http://FGTS.CDB outperforms existing algorithms by a significant margin. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper introduces a new way to make decisions based on information about the context. It’s called contextual dueling bandits, and it compares two options based on what’s happening around us. The researchers developed an algorithm that helps us choose between these options in a smart way, taking into account what we’ve learned so far. They tested their algorithm with synthetic data and found that it works much better than previous methods. |
Keywords
* Artificial intelligence * Synthetic data