Loading Now

Summary of Towards Understanding Epoch-wise Double Descent in Two-layer Linear Neural Networks, by Amanda Olmin et al.


Towards Understanding Epoch-wise Double descent in Two-layer Linear Neural Networks

by Amanda Olmin, Fredrik Lindsten

First submitted to arxiv on: 13 Jul 2024

Categories

  • Main: Machine Learning (stat.ML)
  • Secondary: Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper investigates the phenomenon of epoch-wise double descent, where machine learning models exhibit a generalization curve with two descents during the learning process. The study aims to understand the underlying mechanisms driving this behavior in simple models like linear regression and extend it to more complex models, such as deep neural networks. To achieve this, the authors analyze two-layer linear neural networks, deriving a gradient flow that bridges the learning dynamics of standard linear regression and linear two-layer diagonal networks with quadratic weights. The analysis reveals additional factors contributing to epoch-wise double descent in the two-layer model, including input-output covariance matrix singular values. This research has implications for employing conventional selection methods, such as early stopping, to mitigate overfitting.
Low GrooveSquid.com (original content) Low Difficulty Summary
Epoch-wise double descent is a fascinating phenomenon where machine learning models get better at generalizing beyond the point of overfitting. Researchers are trying to understand why this happens and how it can be applied to more complex models like deep neural networks. In this paper, scientists study simpler models to see if they can learn something that applies to more complicated ones. They found that adding an extra layer to a simple model makes things more interesting and opens up new questions about what else might be going on.

Keywords

» Artificial intelligence  » Early stopping  » Generalization  » Linear regression  » Machine learning  » Overfitting