Summary of Feature Averaging: An Implicit Bias Of Gradient Descent Leading to Non-robustness in Neural Networks, by Binghui Li et al.

Feature Averaging: An Implicit Bias of Gradient Descent Leading to Non-Robustness in Neural Networks

by Binghui Li, Zhixuan Pan, Kaifeng Lyu, Jian Li

First submitted to arxiv on: 14 Oct 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The paper investigates a phenomenon called “Feature Averaging” in gradient descent training, which is a major contributor to the non-robustness of deep neural networks. It’s shown that even when multiple features are present, neural networks tend to rely on an average or combination of these features for classification rather than distinguishing each feature individually. Theoretical analysis of two-layer ReLU networks on binary classification tasks demonstrates that gradient descent biases the network towards feature averaging, making it vulnerable to input perturbations aligned with negative averaged features. To mitigate this vulnerability, more granular supervision is proposed, and it’s proved that a two-layer ReLU network can achieve optimal robustness when trained to classify individual features rather than binary classes. Experiments on synthetic datasets, MNIST, and CIFAR-10 validate the findings.
Low	GrooveSquid.com (original content)	Low Difficulty Summary The paper explores how something called “Feature Averaging” affects deep neural networks. Feature Averaging means that even when there are many important features in the data, the network only uses an average of those features to make predictions. This can be a problem because it makes the network more likely to misbehave when given tricky input data. The researchers show how this happens and why it’s a bad thing. They also suggest ways to fix the problem by giving the network more detailed guidance on what to do.

Keywords

* Artificial intelligence * Classification * Gradient descent * Relu

Feature Averaging: An Implicit Bias of Gradient Descent Leading to Non-Robustness in Neural Networks

by Binghui Li, Zhixuan Pan, Kaifeng Lyu, Jian Li

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Dirw: Path-aware Digraph Learning For Heterophily, by Daohan Su et al.

Summary of Principled Bayesian Optimisation in Collaboration with Human Experts, by Wenjie Xu et al.

Related Posts