Loading Now

Summary of On the Convergence Of Adam Under Non-uniform Smoothness: Separability From Sgdm and Beyond, by Bohan Wang and Huishuai Zhang and Qi Meng and Ruoyu Sun and Zhi-ming Ma and Wei Chen


On the Convergence of Adam under Non-uniform Smoothness: Separability from SGDM and Beyond

by Bohan Wang, Huishuai Zhang, Qi Meng, Ruoyu Sun, Zhi-Ming Ma, Wei Chen

First submitted to arxiv on: 22 Mar 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Optimization and Control (math.OC)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The paper compares the convergence rates of Stochastic Gradient Descent with Momentum (SGDM) and Adam, showing that Adam achieves faster convergence under certain conditions. Specifically, Adam attains the known lower bound for deterministic first-order optimizers in deterministic environments, whereas GDM has higher order dependence on initial function value. In stochastic settings, Adam’s upper bound matches the lower bounds of stochastic first-order optimizers, considering both initial function value and final error. The paper also introduces a novel stopping-time based technique to prove that Adam’s convergence rate can match lower bounds across all problem hyperparameters.
Low GrooveSquid.com (original content) Low Difficulty Summary
Adam is compared to Stochastic Gradient Descent with Momentum (SGDM) in terms of their convergence rates. Researchers found that Adam works better than SGDM under certain conditions. Adam is able to reach a known limit for first-order optimizers, while SGDM takes longer and depends more on the starting point of the function. The study also looks at how these algorithms work in situations with some randomness. Overall, this helps us understand how these two important optimization methods perform.

Keywords

* Artificial intelligence  * Optimization  * Stochastic gradient descent