Loading Now

Summary of Simplicity Bias and Optimization Threshold in Two-layer Relu Networks, by Etienne Boursier and Nicolas Flammarion


Simplicity bias and optimization threshold in two-layer ReLU networks

by Etienne Boursier, Nicolas Flammarion

First submitted to arxiv on: 3 Oct 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Machine Learning (stat.ML)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
Understanding generalization of overparametrized neural networks remains a crucial challenge in machine learning. Most studies focus on interpolation, assuming convergence towards a global minimum of the training loss. However, as the number of training samples increases, this paradigm no longer holds for complex tasks like in-context learning or diffusion. Empirically, it’s been observed that trained models transition from global minima to spurious local minima of the training loss, corresponding to the minimizer of the true population loss. This paper theoretically explores this phenomenon in two-layer ReLU networks, demonstrating that overparametrized networks often converge towards simpler solutions rather than interpolating the training data. This simplicity bias leads to a drastic improvement on the test loss with respect to interpolating solutions. Our analysis relies on the early alignment phase, where neurons align towards specific directions, resulting in an optimization threshold beyond which interpolation is not reached.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper explores why overparametrized neural networks work well for some tasks but not others. Typically, these networks do a good job of “filling in” the gaps in training data, but this doesn’t always translate to real-world performance. The authors found that when there’s too much information, the network actually starts to simplify its solutions rather than trying to fit every detail. This makes it better at generalizing to new situations. The paper shows how this happens using a specific type of neural network and provides insights into why overparametrization can be helpful.

Keywords

» Artificial intelligence  » Alignment  » Diffusion  » Generalization  » Machine learning  » Neural network  » Optimization  » Relu