Loading Now

Summary of Towards Better Generalization: Weight Decay Induces Low-rank Bias For Neural Networks, by Ke Chen et al.


Towards Better Generalization: Weight Decay Induces Low-rank Bias for Neural Networks

by Ke Chen, Chugang Yi, Haizhao Yang

First submitted to arxiv on: 3 Oct 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Machine Learning (stat.ML)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This research paper investigates the implicit bias towards low-rank weight matrices when training neural networks (NN) using Weight Decay (WD). The authors prove that a ReLU NN sufficiently trained with Stochastic Gradient Descent (SGD) and WD will have a weight matrix approximating a rank-two matrix. Empirical results show that WD is essential for inducing this bias across regression and classification tasks. The paper’s findings differ from previous studies, as they don’t rely on common assumptions about the training data distribution or specific training procedures. Furthermore, by leveraging this low-rank bias, the authors derive improved generalization error bounds and provide numerical evidence demonstrating better generalization performance can be achieved with SGD and WD.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper looks at how neural networks behave when trained using a technique called Weight Decay (WD). They found that when training these networks, they tend to become very simple in one way – their weight matrices get smaller. This is important because it helps the networks generalize well, which means they can perform well on new, unseen data. The researchers didn’t assume anything about the data or how the networks were trained, and they found that this simplicity leads to better performance.

Keywords

» Artificial intelligence  » Classification  » Generalization  » Regression  » Relu  » Stochastic gradient descent