Loading Now

Summary of Online Learning and Information Exponents: on the Importance Of Batch Size, and Time/complexity Tradeoffs, by Luca Arnaboldi et al.


Online Learning and Information Exponents: On The Importance of Batch size, and Time/Complexity Tradeoffs

by Luca Arnaboldi, Yatin Dandi, Florent Krzakala, Bruno Loureiro, Luca Pesce, Ludovic Stephan

First submitted to arxiv on: 4 Jun 2024

Categories

  • Main: Machine Learning (stat.ML)
  • Secondary: Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
A study investigates how the batch size affects the time it takes to train neural networks using stochastic gradient descent (SGD) on specific types of target functions. The optimal batch size is found to depend on the difficulty of the target function and can be minimized without increasing the overall sample complexity. However, large batch sizes are shown to be detrimental for improving training efficiency. A new training protocol called Correlation loss SGD is introduced, which helps overcome this limitation by reducing auto-correlation terms in the loss function. The study also shows that the training process can be tracked using a system of ordinary differential equations (ODEs). Finally, the theoretical results are validated through numerical experiments.
Low GrooveSquid.com (original content) Low Difficulty Summary
This research explores how changing the batch size affects the time it takes to train neural networks. It finds that the optimal batch size depends on the difficulty of what we’re trying to learn. While bigger batches can help sometimes, they often make things worse. To get around this problem, a new way of training called Correlation loss SGD is developed. This method helps by reducing unwanted patterns in the data. The study also shows how to track the training process using simple equations.

Keywords

» Artificial intelligence  » Loss function  » Stochastic gradient descent