Loading Now

Summary of Understanding Adam Requires Better Rotation Dependent Assumptions, by Lucas Maes et al.


Understanding Adam Requires Better Rotation Dependent Assumptions

by Lucas Maes, Tianyue H. Zhang, Alexia Jolicoeur-Martineau, Ioannis Mitliagkas, Damien Scieur, Simon Lacoste-Julien, Charles Guille-Escuret

First submitted to arxiv on: 25 Oct 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The paper investigates Adam’s advantage over Stochastic Gradient Descent (SGD) by analyzing its sensitivity to rotations of the parameter space. The authors demonstrate that Adam’s performance degrades under random rotations, indicating a crucial dependence on the choice of basis. This suggests that conventional rotation-invariant assumptions are insufficient to capture Adam’s advantages theoretically. The paper also identifies structured rotations that preserve or enhance Adam’s empirical performance and evaluates the adequacy of existing rotation-dependent assumptions in explaining Adam’s behavior across various rotation types. By highlighting the need for new, rotation-dependent theoretical frameworks, this work aims to provide a comprehensive understanding of Adam’s empirical success in modern machine learning tasks.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper looks at why Adam is better than Stochastic Gradient Descent (SGD) by studying how it behaves when the direction of the parameters changes. They found that Adam gets worse when the direction of the parameters is changed randomly, which means that it depends on the choice of basis. This shows that we can’t just assume that Adam works in any direction. The paper also finds some special types of changes to the parameter directions that make Adam work even better. Finally, they evaluate whether current ideas about how Adam works are correct or not.

Keywords

» Artificial intelligence  » Machine learning  » Stochastic gradient descent