Summary of Distributed Online Bandit Nonconvex Optimization with One-point Residual Feedback Via Dynamic Regret, by Youqing Hua et al.
Distributed Online Bandit Nonconvex Optimization with One-Point Residual Feedback via Dynamic Regret
by Youqing Hua, Shuai Liu, Yiguang Hong, Karl Henrik Johansson, Guangchen Wang
First submitted to arxiv on: 24 Sep 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Optimization and Control (math.OC)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper proposes a novel one-point residual feedback distributed online algorithm for solving the distributed online bandit optimization problem with nonconvex loss functions over a time-varying digraph. The algorithm estimates the gradient using residuals from two points, reducing the regret bound while maintaining O(1) sampling complexity per iteration. The performance is evaluated using dynamic regret, and the expected dynamic regret is comparable to existing algorithms that use two-point feedback, assuming sublinear growth in the objective function sequence and path length of minimization. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary The paper solves a problem where online players make decisions and an adversary assigns nonconvex loss functions. Players want to minimize global losses while only seeing one point’s loss value. Traditional multi-point bandit algorithms don’t work, but our new algorithm estimates gradients using residuals from two points. This reduces regret bounds and maintains efficient sampling. We evaluate the algorithm using dynamic regret and show it performs similarly to existing methods that use more feedback. |
Keywords
» Artificial intelligence » Objective function » Optimization