Summary of Understanding Adam Requires Better Rotation Dependent Assumptions, by Lucas Maes et al.
Understanding Adam Requires Better Rotation Dependent Assumptions
by Lucas Maes, Tianyue H. Zhang, Alexia Jolicoeur-Martineau, Ioannis Mitliagkas, Damien Scieur, Simon Lacoste-Julien, Charles Guille-Escuret
First submitted to arxiv on: 25 Oct 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The paper investigates Adam’s advantage over Stochastic Gradient Descent (SGD) by analyzing its sensitivity to rotations of the parameter space. The authors demonstrate that Adam’s performance degrades under random rotations, indicating a crucial dependence on the choice of basis. This suggests that conventional rotation-invariant assumptions are insufficient to capture Adam’s advantages theoretically. The paper also identifies structured rotations that preserve or enhance Adam’s empirical performance and evaluates the adequacy of existing rotation-dependent assumptions in explaining Adam’s behavior across various rotation types. By highlighting the need for new, rotation-dependent theoretical frameworks, this work aims to provide a comprehensive understanding of Adam’s empirical success in modern machine learning tasks. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper looks at why Adam is better than Stochastic Gradient Descent (SGD) by studying how it behaves when the direction of the parameters changes. They found that Adam gets worse when the direction of the parameters is changed randomly, which means that it depends on the choice of basis. This shows that we can’t just assume that Adam works in any direction. The paper also finds some special types of changes to the parameter directions that make Adam work even better. Finally, they evaluate whether current ideas about how Adam works are correct or not. |
Keywords
» Artificial intelligence » Machine learning » Stochastic gradient descent