Summary of Simplicity Bias and Optimization Threshold in Two-layer Relu Networks, by Etienne Boursier and Nicolas Flammarion

Simplicity bias and optimization threshold in two-layer ReLU networks

by Etienne Boursier, Nicolas Flammarion

First submitted to arxiv on: 3 Oct 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary Understanding generalization of overparametrized neural networks remains a crucial challenge in machine learning. Most studies focus on interpolation, assuming convergence towards a global minimum of the training loss. However, as the number of training samples increases, this paradigm no longer holds for complex tasks like in-context learning or diffusion. Empirically, it’s been observed that trained models transition from global minima to spurious local minima of the training loss, corresponding to the minimizer of the true population loss. This paper theoretically explores this phenomenon in two-layer ReLU networks, demonstrating that overparametrized networks often converge towards simpler solutions rather than interpolating the training data. This simplicity bias leads to a drastic improvement on the test loss with respect to interpolating solutions. Our analysis relies on the early alignment phase, where neurons align towards specific directions, resulting in an optimization threshold beyond which interpolation is not reached.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This paper explores why overparametrized neural networks work well for some tasks but not others. Typically, these networks do a good job of “filling in” the gaps in training data, but this doesn’t always translate to real-world performance. The authors found that when there’s too much information, the network actually starts to simplify its solutions rather than trying to fit every detail. This makes it better at generalizing to new situations. The paper shows how this happens using a specific type of neural network and provides insights into why overparametrization can be helpful.

Keywords

» Artificial intelligence » Alignment » Diffusion » Generalization » Machine learning » Neural network » Optimization » Relu

Simplicity bias and optimization threshold in two-layer ReLU networks

by Etienne Boursier, Nicolas Flammarion

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Density Based Spatial Clustering Of Lines Via Probabilistic Generation Of Neighbourhood, by Akanksha Das et al.

Summary of Better Call Saul: Fluent and Consistent Language Model Editing with Generation Regularization, by Mingyang Wang et al.

Related Posts