Summary of Understanding Adam Requires Better Rotation Dependent Assumptions, by Lucas Maes et al.

Understanding Adam Requires Better Rotation Dependent Assumptions

by Lucas Maes, Tianyue H. Zhang, Alexia Jolicoeur-Martineau, Ioannis Mitliagkas, Damien Scieur, Simon Lacoste-Julien, Charles Guille-Escuret

First submitted to arxiv on: 25 Oct 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The paper investigates Adam’s advantage over Stochastic Gradient Descent (SGD) by analyzing its sensitivity to rotations of the parameter space. The authors demonstrate that Adam’s performance degrades under random rotations, indicating a crucial dependence on the choice of basis. This suggests that conventional rotation-invariant assumptions are insufficient to capture Adam’s advantages theoretically. The paper also identifies structured rotations that preserve or enhance Adam’s empirical performance and evaluates the adequacy of existing rotation-dependent assumptions in explaining Adam’s behavior across various rotation types. By highlighting the need for new, rotation-dependent theoretical frameworks, this work aims to provide a comprehensive understanding of Adam’s empirical success in modern machine learning tasks.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This paper looks at why Adam is better than Stochastic Gradient Descent (SGD) by studying how it behaves when the direction of the parameters changes. They found that Adam gets worse when the direction of the parameters is changed randomly, which means that it depends on the choice of basis. This shows that we can’t just assume that Adam works in any direction. The paper also finds some special types of changes to the parameter directions that make Adam work even better. Finally, they evaluate whether current ideas about how Adam works are correct or not.

Keywords

* Artificial intelligence * Machine learning * Stochastic gradient descent

Understanding Adam Requires Better Rotation Dependent Assumptions

by Lucas Maes, Tianyue H. Zhang, Alexia Jolicoeur-Martineau, Ioannis Mitliagkas, Damien Scieur, Simon Lacoste-Julien, Charles Guille-Escuret

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Dualmar: Medical-augmented Representation From Dual-expertise Perspectives, by Pengfei Hu et al.

Summary of Global Graph Counterfactual Explanation: a Subgraph Mapping Approach, by Yinhan He et al.

Related Posts