Loading Now

Summary of 4+3 Phases Of Compute-optimal Neural Scaling Laws, by Elliot Paquette et al.


4+3 Phases of Compute-Optimal Neural Scaling Laws

by Elliot Paquette, Courtney Paquette, Lechao Xiao, Jeffrey Pennington

First submitted to arxiv on: 23 May 2024

Categories

  • Main: Machine Learning (stat.ML)
  • Secondary: Machine Learning (cs.LG); Optimization and Control (math.OC); Probability (math.PR); Statistics Theory (math.ST)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The proposed neural scaling model is used to study the compute-limited, infinite-data scaling law regime. The model has three parameters: data complexity, target complexity, and model-parameter-count. To train the model, one-pass stochastic gradient descent is run on a mean-squared loss. The authors derive a representation of the loss curves that holds over all iteration counts and improves in accuracy as the model parameter count grows. The optimal model-parameter-count is derived as a function of floating point operation budget.
Low GrooveSquid.com (original content) Low Difficulty Summary
The neural scaling model helps researchers understand how to optimize models for large amounts of data. The model has three main factors: how complex the data is, how complex the target (what we’re trying to learn) is, and how many parameters the model uses. By training the model with a special type of algorithm called one-pass stochastic gradient descent, researchers can get more accurate results as they increase the number of model parameters. The study shows that there are four main phases in how the model works depending on these three factors, and it provides mathematical proof and examples to support this.

Keywords

» Artificial intelligence  » Stochastic gradient descent