Loading Now

Summary of Bayes-optimal Learning Of An Extensive-width Neural Network From Quadratically Many Samples, by Antoine Maillard et al.


Bayes-optimal learning of an extensive-width neural network from quadratically many samples

by Antoine Maillard, Emanuele Troiani, Simon Martin, Florent Krzakala, Lenka Zdeborová

First submitted to arxiv on: 7 Aug 2024

Categories

  • Main: Machine Learning (stat.ML)
  • Secondary: Disordered Systems and Neural Networks (cond-mat.dis-nn); Information Theory (cs.IT); Machine Learning (cs.LG); Probability (math.PR)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper addresses the problem of learning target functions corresponding to single-hidden-layer neural networks with quadratic activation functions after the first layer. The authors consider the asymptotic limit where the input dimension and network width are proportionally large. Recent work established that linear regression provides Bayes-optimal test error when the number of available samples is linear in the dimension. This paper solves the challenge of theoretically analyzing optimal test error in the regime where the number of samples is quadratic in the dimension, deriving a closed-form expression for the Bayes-optimal test error. The authors also propose an algorithm, GAMP-RIE, combining approximate message passing with rotationally invariant matrix denoising, which asymptotically achieves optimal performance. The result relies on linking recent works on optimal denoising of extensive-rank matrices and ellipsoid fitting problem. Empirical results show that randomly-initialized gradient descent samples the space of weights, leading to zero training loss.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper is about learning a special kind of neural network with quadratic activation functions. The authors want to understand how well we can learn this type of function when we have many input values and a large number of parameters in the network. They build on previous work that showed linear regression is good for learning simple functions, but they challenge themselves to see if they can do better for more complex functions. To solve this problem, they develop an algorithm called GAMP-RIE, which helps us learn the best possible function. They also show that a popular optimization method, gradient descent with random initialization, can be surprisingly effective in some cases.

Keywords

» Artificial intelligence  » Gradient descent  » Linear regression  » Neural network  » Optimization