Loading Now

Summary of Momentum Does Not Reduce Stochastic Noise in Stochastic Gradient Descent, by Naoki Sato and Hideaki Iiduka


Momentum Does Not Reduce Stochastic Noise in Stochastic Gradient Descent

by Naoki Sato, Hideaki Iiduka

First submitted to arxiv on: 4 Feb 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Optimization and Control (math.OC)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper investigates the effectiveness of stochastic gradient descent (SGD) with momentum in optimizing non-convex objective functions, particularly in deep neural network training. The authors analyze the impact of momentum on reducing stochastic noise and improving generalizability. They estimate the magnitude of gradient noise using convergence analysis and an optimal batch size estimation formula, finding that momentum does not significantly reduce gradient noise. Additionally, they examine search direction noise, which inherently smooths the objective function, and conclude that momentum does not offer significant advantages to SGD.
Low GrooveSquid.com (original content) Low Difficulty Summary
In this study, researchers explore whether adding momentum to stochastic gradient descent (SGD) helps in training deep neural networks with non-convex objectives. They check if momentum really reduces noisy gradients. By using special formulas for convergence analysis and optimal batch sizes, they find that momentum doesn’t make a big difference. They also look at “search direction noise” and see that this kind of noise actually makes the optimization problem easier. So, adding momentum to SGD doesn’t bring many benefits.

Keywords

* Artificial intelligence  * Neural network  * Objective function  * Optimization  * Stochastic gradient descent