Summary of Super Consistency Of Neural Network Landscapes and Learning Rate Transfer, by Lorenzo Noci et al.
Super Consistency of Neural Network Landscapes and Learning Rate Transfer
by Lorenzo Noci, Alexandru Meterez, Thomas Hofmann, Antonio Orvieto
First submitted to arxiv on: 27 Feb 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: None
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper investigates the relationship between neural network size and optimization landscape properties. The authors find that when scaling width and depth towards the “rich feature learning limit,” certain hyperparameters, such as learning rate, exhibit transfer from small to large models. They analyze the loss Hessian’s largest eigenvalue (sharpness) and discover Super Consistency of the landscape under this regime. In contrast, they show different sharpness dynamics in Neural Tangent Kernel (NTK) and other scaling regimes, attributing these differences to feature learning presence or absence. The authors corroborate their claims with experiments on various datasets and architectures. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper looks at how big neural networks are connected to how well they learn. They found that if you make the network bigger in certain ways, some settings don’t change much even as the network gets much bigger. This is strange because we might expect things to be very different between small and huge models. The researchers studied this by looking at a special matrix (Hessian) that helps us understand how the model works. They found that in one case, the Hessian stays similar even as the network gets bigger. But in other cases, it changes a lot. This is because of how features are learned in each type of network. The researchers tested their ideas on many different datasets and types of networks. |
Keywords
* Artificial intelligence * Neural network * Optimization