Loading Now

Summary of Early Learning Of the Optimal Constant Solution in Neural Networks and Humans, by Jirko Rubruck et al.


Early learning of the optimal constant solution in neural networks and humans

by Jirko Rubruck, Jan P. Bauer, Andrew Saxe, Christopher Summerfield

First submitted to arxiv on: 25 Jun 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: None

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
Deep neural networks learn complex functions during training, but surprisingly, they first learn an “optimal constant solution” (OCS) – initial responses mirror target labels, ignoring input information. Researchers used a hierarchical category learning task to analyze deep linear network dynamics and found that adding bias terms changed early learning. They identified OCS hallmarks in both deep linear networks and convolutional neural networks solving MNIST and CIFAR10 tasks. The study proved that deep linear networks learn the OCS during early learning, and surprisingly, human learners exhibited similar behavior over three days. Finally, researchers showed that the OCS emerges even without bias terms, driven by input data correlations. This work suggests the OCS as a universal learning principle in supervised learning.
Low GrooveSquid.com (original content) Low Difficulty Summary
This study found something unexpected about how neural networks learn: they first learn to predict a constant answer (like always saying “yes” or “no”) before getting more complex. Researchers looked at how this works and found that adding certain features helps the network learn better. They also tested human learners on the same task and found similar behavior. The study showed that this early learning is important and can happen even without special features, as long as there are patterns in the input data.

Keywords

» Artificial intelligence  » Supervised