Loading Now

Summary of Ordered Momentum For Asynchronous Sgd, by Chang-wei Shi et al.


Ordered Momentum for Asynchronous SGD

by Chang-Wei Shi, Yi-Rui Yang, Wu-Jun Li

First submitted to arxiv on: 27 Jul 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Distributed, Parallel, and Cluster Computing (cs.DC)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The proposed ordered momentum (OrMo) method for Asynchronous Stochastic Gradient Descent (ASGD) is a novel approach to distributed learning, addressing the challenge of incorporating momentum into ASGD without hindering convergence. The paper theoretically proves the convergence of OrMo with constant and delay-adaptive learning rates for non-convex problems, building upon existing works that have shown the benefits of momentum in deep model training. Experimental results demonstrate improved convergence performance compared to ASGD and other asynchronous methods with momentum.
Low GrooveSquid.com (original content) Low Difficulty Summary
Distributed learning is important for training big artificial intelligence models. A common way to do this is by using a method called Asynchronous Stochastic Gradient Descent (ASGD). However, when we add another technique called momentum to ASGD, it can actually make things worse. In this paper, the authors suggest a new approach called ordered momentum (OrMo) that helps solve this problem. They show mathematically that OrMo works well for training models and even outperforms other methods in some cases.

Keywords

» Artificial intelligence  » Stochastic gradient descent