Summary of Distributed Online Bandit Nonconvex Optimization with One-point Residual Feedback Via Dynamic Regret, by Youqing Hua et al.

Distributed Online Bandit Nonconvex Optimization with One-Point Residual Feedback via Dynamic Regret

by Youqing Hua, Shuai Liu, Yiguang Hong, Karl Henrik Johansson, Guangchen Wang

First submitted to arxiv on: 24 Sep 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary This paper proposes a novel one-point residual feedback distributed online algorithm for solving the distributed online bandit optimization problem with nonconvex loss functions over a time-varying digraph. The algorithm estimates the gradient using residuals from two points, reducing the regret bound while maintaining O(1) sampling complexity per iteration. The performance is evaluated using dynamic regret, and the expected dynamic regret is comparable to existing algorithms that use two-point feedback, assuming sublinear growth in the objective function sequence and path length of minimization.
Low	GrooveSquid.com (original content)	Low Difficulty Summary The paper solves a problem where online players make decisions and an adversary assigns nonconvex loss functions. Players want to minimize global losses while only seeing one point’s loss value. Traditional multi-point bandit algorithms don’t work, but our new algorithm estimates gradients using residuals from two points. This reduces regret bounds and maintains efficient sampling. We evaluate the algorithm using dynamic regret and show it performs similarly to existing methods that use more feedback.

Keywords

* Artificial intelligence * Objective function * Optimization

Distributed Online Bandit Nonconvex Optimization with One-Point Residual Feedback via Dynamic Regret

by Youqing Hua, Shuai Liu, Yiguang Hong, Karl Henrik Johansson, Guangchen Wang

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Causkelnet: Causal Representation Learning For Human Behaviour Analysis, by Xingrui Gu et al.

Summary of Development and Validation Of Heparin Dosing Policies Using An Offline Reinforcement Learning Algorithm, by Yooseok Lim et al.

Related Posts