Loading Now

Summary of Super Consistency Of Neural Network Landscapes and Learning Rate Transfer, by Lorenzo Noci et al.


Super Consistency of Neural Network Landscapes and Learning Rate Transfer

by Lorenzo Noci, Alexandru Meterez, Thomas Hofmann, Antonio Orvieto

First submitted to arxiv on: 27 Feb 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: None

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper investigates the relationship between neural network size and optimization landscape properties. The authors find that when scaling width and depth towards the “rich feature learning limit,” certain hyperparameters, such as learning rate, exhibit transfer from small to large models. They analyze the loss Hessian’s largest eigenvalue (sharpness) and discover Super Consistency of the landscape under this regime. In contrast, they show different sharpness dynamics in Neural Tangent Kernel (NTK) and other scaling regimes, attributing these differences to feature learning presence or absence. The authors corroborate their claims with experiments on various datasets and architectures.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper looks at how big neural networks are connected to how well they learn. They found that if you make the network bigger in certain ways, some settings don’t change much even as the network gets much bigger. This is strange because we might expect things to be very different between small and huge models. The researchers studied this by looking at a special matrix (Hessian) that helps us understand how the model works. They found that in one case, the Hessian stays similar even as the network gets bigger. But in other cases, it changes a lot. This is because of how features are learned in each type of network. The researchers tested their ideas on many different datasets and types of networks.

Keywords

* Artificial intelligence  * Neural network  * Optimization