Summary of On the Convergence Of Adam Under Non-uniform Smoothness: Separability From Sgdm and Beyond, by Bohan Wang and Huishuai Zhang and Qi Meng and Ruoyu Sun and Zhi-ming Ma and Wei Chen
On the Convergence of Adam under Non-uniform Smoothness: Separability from SGDM and Beyond
by Bohan Wang, Huishuai Zhang, Qi Meng, Ruoyu Sun, Zhi-Ming Ma, Wei Chen
First submitted to arxiv on: 22 Mar 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Optimization and Control (math.OC)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
| Summary difficulty | Written by | Summary |
|---|---|---|
| High | Paper authors | High Difficulty Summary Read the original abstract here |
| Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The paper compares the convergence rates of Stochastic Gradient Descent with Momentum (SGDM) and Adam, showing that Adam achieves faster convergence under certain conditions. Specifically, Adam attains the known lower bound for deterministic first-order optimizers in deterministic environments, whereas GDM has higher order dependence on initial function value. In stochastic settings, Adam’s upper bound matches the lower bounds of stochastic first-order optimizers, considering both initial function value and final error. The paper also introduces a novel stopping-time based technique to prove that Adam’s convergence rate can match lower bounds across all problem hyperparameters. |
| Low | GrooveSquid.com (original content) | Low Difficulty Summary Adam is compared to Stochastic Gradient Descent with Momentum (SGDM) in terms of their convergence rates. Researchers found that Adam works better than SGDM under certain conditions. Adam is able to reach a known limit for first-order optimizers, while SGDM takes longer and depends more on the starting point of the function. The study also looks at how these algorithms work in situations with some randomness. Overall, this helps us understand how these two important optimization methods perform. |
Keywords
* Artificial intelligence * Optimization * Stochastic gradient descent




