Summary of Variance-dependent Regret Bounds For Non-stationary Linear Bandits, by Zhiyong Wang and Jize Xie and Yi Chen and John C.s. Lui and Dongruo Zhou

Variance-Dependent Regret Bounds for Non-stationary Linear Bandits

by Zhiyong Wang, Jize Xie, Yi Chen, John C.S. Lui, Dongruo Zhou

First submitted to arxiv on: 15 Mar 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The abstract presents a novel approach to solving the non-stationary stochastic linear bandit problem, where the reward distribution evolves with each round. The existing methods only consider the total variation budget (B_K) which measures non-stationarity in terms of the expected reward distribution, leading to sub-optimal performance under general non-stationary settings. This work proposes two algorithms: Restarted Weighted OFUL+ and Restarted SAVE+, which utilize both the variance and B_K to achieve tighter regret upper bounds. These algorithms are designed for cases where variance information is known or unknown. The paper shows that when total variance (V_K) is much smaller than K, our proposed algorithms outperform state-of-the-art results on non-stationary stochastic linear bandits. Experimental evaluations validate the superior performance of these algorithms.
Low	GrooveSquid.com (original content)	Low Difficulty Summary The researchers explored a problem in machine learning where the reward or outcome changes over time. They looked at ways to make better decisions when this happens. Right now, there are limited methods for dealing with changing outcomes. The authors propose two new methods that can handle both changes in the average and the spread of the rewards. These methods work better than existing ones in certain situations. The results were tested and show that these new methods can perform well.

Keywords

* Artificial intelligence * Machine learning

Variance-Dependent Regret Bounds for Non-stationary Linear Bandits

by Zhiyong Wang, Jize Xie, Yi Chen, John C.S. Lui, Dongruo Zhou

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Using Uncertainty Quantification to Characterize and Improve Out-of-domain Learning For Pdes, by S. Chandra Mouli et al.

Summary of Can Large Language Models Solve Robot Routing?, by Zhehui Huang et al.

Related Posts