Summary of Variance-dependent Regret Bounds For Non-stationary Linear Bandits, by Zhiyong Wang and Jize Xie and Yi Chen and John C.s. Lui and Dongruo Zhou
Variance-Dependent Regret Bounds for Non-stationary Linear Bandits
by Zhiyong Wang, Jize Xie, Yi Chen, John C.S. Lui, Dongruo Zhou
First submitted to arxiv on: 15 Mar 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The abstract presents a novel approach to solving the non-stationary stochastic linear bandit problem, where the reward distribution evolves with each round. The existing methods only consider the total variation budget (B_K) which measures non-stationarity in terms of the expected reward distribution, leading to sub-optimal performance under general non-stationary settings. This work proposes two algorithms: Restarted Weighted OFUL+ and Restarted SAVE+, which utilize both the variance and B_K to achieve tighter regret upper bounds. These algorithms are designed for cases where variance information is known or unknown. The paper shows that when total variance (V_K) is much smaller than K, our proposed algorithms outperform state-of-the-art results on non-stationary stochastic linear bandits. Experimental evaluations validate the superior performance of these algorithms. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary The researchers explored a problem in machine learning where the reward or outcome changes over time. They looked at ways to make better decisions when this happens. Right now, there are limited methods for dealing with changing outcomes. The authors propose two new methods that can handle both changes in the average and the spread of the rewards. These methods work better than existing ones in certain situations. The results were tested and show that these new methods can perform well. |
Keywords
* Artificial intelligence * Machine learning