Summary of Super Consistency Of Neural Network Landscapes and Learning Rate Transfer, by Lorenzo Noci et al.

Super Consistency of Neural Network Landscapes and Learning Rate Transfer

by Lorenzo Noci, Alexandru Meterez, Thomas Hofmann, Antonio Orvieto

First submitted to arxiv on: 27 Feb 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary This paper investigates the relationship between neural network size and optimization landscape properties. The authors find that when scaling width and depth towards the “rich feature learning limit,” certain hyperparameters, such as learning rate, exhibit transfer from small to large models. They analyze the loss Hessian’s largest eigenvalue (sharpness) and discover Super Consistency of the landscape under this regime. In contrast, they show different sharpness dynamics in Neural Tangent Kernel (NTK) and other scaling regimes, attributing these differences to feature learning presence or absence. The authors corroborate their claims with experiments on various datasets and architectures.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This paper looks at how big neural networks are connected to how well they learn. They found that if you make the network bigger in certain ways, some settings don’t change much even as the network gets much bigger. This is strange because we might expect things to be very different between small and huge models. The researchers studied this by looking at a special matrix (Hessian) that helps us understand how the model works. They found that in one case, the Hessian stays similar even as the network gets bigger. But in other cases, it changes a lot. This is because of how features are learned in each type of network. The researchers tested their ideas on many different datasets and types of networks.

Keywords

* Artificial intelligence * Neural network * Optimization

Super Consistency of Neural Network Landscapes and Learning Rate Transfer

by Lorenzo Noci, Alexandru Meterez, Thomas Hofmann, Antonio Orvieto

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Principled Architecture-aware Scaling Of Hyperparameters, by Wuyang Chen et al.

Summary of Qos Prediction in Radio Vehicular Environments Via Prior User Information, by Noor Ul Ain et al.

Related Posts