Summary of On the Convergence Of Adam Under Non-uniform Smoothness: Separability From Sgdm and Beyond, by Bohan Wang and Huishuai Zhang and Qi Meng and Ruoyu Sun and Zhi-ming Ma and Wei Chen

On the Convergence of Adam under Non-uniform Smoothness: Separability from SGDM and Beyond

by Bohan Wang, Huishuai Zhang, Qi Meng, Ruoyu Sun, Zhi-Ming Ma, Wei Chen

First submitted to arxiv on: 22 Mar 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The paper compares the convergence rates of Stochastic Gradient Descent with Momentum (SGDM) and Adam, showing that Adam achieves faster convergence under certain conditions. Specifically, Adam attains the known lower bound for deterministic first-order optimizers in deterministic environments, whereas GDM has higher order dependence on initial function value. In stochastic settings, Adam’s upper bound matches the lower bounds of stochastic first-order optimizers, considering both initial function value and final error. The paper also introduces a novel stopping-time based technique to prove that Adam’s convergence rate can match lower bounds across all problem hyperparameters.
Low	GrooveSquid.com (original content)	Low Difficulty Summary Adam is compared to Stochastic Gradient Descent with Momentum (SGDM) in terms of their convergence rates. Researchers found that Adam works better than SGDM under certain conditions. Adam is able to reach a known limit for first-order optimizers, while SGDM takes longer and depends more on the starting point of the function. The study also looks at how these algorithms work in situations with some randomness. Overall, this helps us understand how these two important optimization methods perform.

Keywords

* Artificial intelligence * Optimization * Stochastic gradient descent

On the Convergence of Adam under Non-uniform Smoothness: Separability from SGDM and Beyond

by Bohan Wang, Huishuai Zhang, Qi Meng, Ruoyu Sun, Zhi-Ming Ma, Wei Chen

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Insights Into the Lottery Ticket Hypothesis and Iterative Magnitude Pruning, by Tausifa Jan Saleem et al.

Summary of Anytime, Anywhere, Anyone: Investigating the Feasibility Of Segment Anything Model For Crowd-sourcing Medical Image Annotations, by Pranav Kulkarni et al.

Related Posts