Summary of Early Learning Of the Optimal Constant Solution in Neural Networks and Humans, by Jirko Rubruck et al.
Early learning of the optimal constant solution in neural networks and humans
by Jirko Rubruck, Jan P. Bauer, Andrew Saxe, Christopher Summerfield
First submitted to arxiv on: 25 Jun 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: None
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary Deep neural networks learn complex functions during training, but surprisingly, they first learn an “optimal constant solution” (OCS) – initial responses mirror target labels, ignoring input information. Researchers used a hierarchical category learning task to analyze deep linear network dynamics and found that adding bias terms changed early learning. They identified OCS hallmarks in both deep linear networks and convolutional neural networks solving MNIST and CIFAR10 tasks. The study proved that deep linear networks learn the OCS during early learning, and surprisingly, human learners exhibited similar behavior over three days. Finally, researchers showed that the OCS emerges even without bias terms, driven by input data correlations. This work suggests the OCS as a universal learning principle in supervised learning. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This study found something unexpected about how neural networks learn: they first learn to predict a constant answer (like always saying “yes” or “no”) before getting more complex. Researchers looked at how this works and found that adding certain features helps the network learn better. They also tested human learners on the same task and found similar behavior. The study showed that this early learning is important and can happen even without special features, as long as there are patterns in the input data. |
Keywords
» Artificial intelligence » Supervised