Loading Now

Summary of Deep Linear Networks For Regression Are Implicitly Regularized Towards Flat Minima, by Pierre Marion et al.


Deep linear networks for regression are implicitly regularized towards flat minima

by Pierre Marion, Lénaïc Chizat

First submitted to arxiv on: 22 May 2024

Categories

  • Main: Machine Learning (stat.ML)
  • Secondary: Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The paper explores the sharpness of deep linear networks in univariate regression tasks, shedding light on their optimization dynamics. The authors demonstrate that minimizers can have arbitrarily large sharpness, but not arbitrarily small, and provide a lower bound on the sharpness growing linearly with depth. They also study the properties of the minimizer found by gradient flow, showing an implicit regularization towards flat minima. The results are shown to be independent of network width and initialization methods, with implications for gradient descent with non-vanishing learning rates.
Low GrooveSquid.com (original content) Low Difficulty Summary
Deep neural networks are used to learn univariate regression models, but how do they optimize their performance? Researchers have found that the sharpness of these networks can be an important factor in determining their optimization dynamics. This study looks at the sharpness of deep linear networks for univariate regression and shows that minimizers can have large or small sharpness values. The authors also find that the sharpness grows with depth, which could affect how well the network learns.

Keywords

» Artificial intelligence  » Gradient descent  » Optimization  » Regression  » Regularization