Loading Now

Summary of Feature Averaging: An Implicit Bias Of Gradient Descent Leading to Non-robustness in Neural Networks, by Binghui Li et al.


Feature Averaging: An Implicit Bias of Gradient Descent Leading to Non-Robustness in Neural Networks

by Binghui Li, Zhixuan Pan, Kaifeng Lyu, Jian Li

First submitted to arxiv on: 14 Oct 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Machine Learning (stat.ML)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The paper investigates a phenomenon called “Feature Averaging” in gradient descent training, which is a major contributor to the non-robustness of deep neural networks. It’s shown that even when multiple features are present, neural networks tend to rely on an average or combination of these features for classification rather than distinguishing each feature individually. Theoretical analysis of two-layer ReLU networks on binary classification tasks demonstrates that gradient descent biases the network towards feature averaging, making it vulnerable to input perturbations aligned with negative averaged features. To mitigate this vulnerability, more granular supervision is proposed, and it’s proved that a two-layer ReLU network can achieve optimal robustness when trained to classify individual features rather than binary classes. Experiments on synthetic datasets, MNIST, and CIFAR-10 validate the findings.
Low GrooveSquid.com (original content) Low Difficulty Summary
The paper explores how something called “Feature Averaging” affects deep neural networks. Feature Averaging means that even when there are many important features in the data, the network only uses an average of those features to make predictions. This can be a problem because it makes the network more likely to misbehave when given tricky input data. The researchers show how this happens and why it’s a bad thing. They also suggest ways to fix the problem by giving the network more detailed guidance on what to do.

Keywords

» Artificial intelligence  » Classification  » Gradient descent  » Relu