Loading Now

Summary of Wide Neural Networks Trained with Weight Decay Provably Exhibit Neural Collapse, by Arthur Jacot et al.


Wide Neural Networks Trained with Weight Decay Provably Exhibit Neural Collapse

by Arthur Jacot, Peter Súkeník, Zihan Wang, Marco Mondelli

First submitted to arxiv on: 7 Oct 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Optimization and Control (math.OC); Machine Learning (stat.ML)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
Deep neural networks (DNNs) exhibit a phenomenon called “neural collapse” at convergence, where they represent the training data via a highly symmetric geometric structure. This has led researchers to investigate the emergence of this collapse, primarily focusing on the unconstrained features model. However, our understanding of DNNs is limited by the fact that these models assume the penultimate layer’s features are free variables, making them data-agnostic and challenging their ability to capture training data. To address this issue, we shift our focus from unconstrained features to DNNs ending with at least two linear layers. We prove generic guarantees on neural collapse under certain assumptions, including low training error, balancedness of the linear layers, and bounded conditioning of the features before the linear part. Our results demonstrate that these assumptions hold for gradient descent training with weight decay, particularly for networks with a wide first layer and solutions that are nearly optimal or stable under large learning rates. This work is significant as it shows neural collapse in the end-to-end training of DNNs.
Low GrooveSquid.com (original content) Low Difficulty Summary
Neural networks have a mysterious behavior when they’re fully trained. They tend to represent the data in a very special way, called “neural collapse.” This has sparked curiosity among researchers who want to understand why this happens. The problem is that these networks are designed to be flexible and can work with any type of data, which makes it hard for them to learn from their own training data. To solve this puzzle, scientists looked at a different kind of network that ends with two special layers called linear layers. They proved that under certain conditions, these networks will indeed exhibit neural collapse during training. This discovery is important because it shows that neural networks can still capture the essence of their training data even when they’re designed to be flexible.

Keywords

» Artificial intelligence  » Gradient descent