Summary of Simplicity Bias and Optimization Threshold in Two-layer Relu Networks, by Etienne Boursier and Nicolas Flammarion
Simplicity bias and optimization threshold in two-layer ReLU networks
by Etienne Boursier, Nicolas Flammarion
First submitted to arxiv on: 3 Oct 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Machine Learning (stat.ML)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary Understanding generalization of overparametrized neural networks remains a crucial challenge in machine learning. Most studies focus on interpolation, assuming convergence towards a global minimum of the training loss. However, as the number of training samples increases, this paradigm no longer holds for complex tasks like in-context learning or diffusion. Empirically, it’s been observed that trained models transition from global minima to spurious local minima of the training loss, corresponding to the minimizer of the true population loss. This paper theoretically explores this phenomenon in two-layer ReLU networks, demonstrating that overparametrized networks often converge towards simpler solutions rather than interpolating the training data. This simplicity bias leads to a drastic improvement on the test loss with respect to interpolating solutions. Our analysis relies on the early alignment phase, where neurons align towards specific directions, resulting in an optimization threshold beyond which interpolation is not reached. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper explores why overparametrized neural networks work well for some tasks but not others. Typically, these networks do a good job of “filling in” the gaps in training data, but this doesn’t always translate to real-world performance. The authors found that when there’s too much information, the network actually starts to simplify its solutions rather than trying to fit every detail. This makes it better at generalizing to new situations. The paper shows how this happens using a specific type of neural network and provides insights into why overparametrization can be helpful. |
Keywords
» Artificial intelligence » Alignment » Diffusion » Generalization » Machine learning » Neural network » Optimization » Relu