Summary of Towards Better Generalization: Weight Decay Induces Low-rank Bias For Neural Networks, by Ke Chen et al.

Towards Better Generalization: Weight Decay Induces Low-rank Bias for Neural Networks

by Ke Chen, Chugang Yi, Haizhao Yang

First submitted to arxiv on: 3 Oct 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary This research paper investigates the implicit bias towards low-rank weight matrices when training neural networks (NN) using Weight Decay (WD). The authors prove that a ReLU NN sufficiently trained with Stochastic Gradient Descent (SGD) and WD will have a weight matrix approximating a rank-two matrix. Empirical results show that WD is essential for inducing this bias across regression and classification tasks. The paper’s findings differ from previous studies, as they don’t rely on common assumptions about the training data distribution or specific training procedures. Furthermore, by leveraging this low-rank bias, the authors derive improved generalization error bounds and provide numerical evidence demonstrating better generalization performance can be achieved with SGD and WD.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This paper looks at how neural networks behave when trained using a technique called Weight Decay (WD). They found that when training these networks, they tend to become very simple in one way – their weight matrices get smaller. This is important because it helps the networks generalize well, which means they can perform well on new, unseen data. The researchers didn’t assume anything about the data or how the networks were trained, and they found that this simplicity leads to better performance.

Keywords

* Artificial intelligence * Classification * Generalization * Regression * Relu * Stochastic gradient descent

Towards Better Generalization: Weight Decay Induces Low-rank Bias for Neural Networks

by Ke Chen, Chugang Yi, Haizhao Yang

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Efficiently Deploying Llms with Controlled Risk, by Michael J. Zellinger and Matt Thomson

Summary of Hatformer: Historic Handwritten Arabic Text Recognition with Transformers, by Adrian Chan et al.

Related Posts