Summary of Benign Overfitting in Leaky Relu Networks with Moderate Input Dimension, by Kedar Karhadkar et al.
Benign overfitting in leaky ReLU networks with moderate input dimension
by Kedar Karhadkar, Erin George, Michael Murray, Guido Montúfar, Deanna Needell
First submitted to arxiv on: 11 Mar 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Machine Learning (stat.ML)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The paper explores benign overfitting in two-layer leaky ReLU networks trained with the hinge loss on a binary classification task. The authors investigate conditions for benign and non-benign (or harmful) overfitting, focusing on the signal-to-noise ratio (SNR) of model parameters. They find that high SNR leads to benign overfitting, while low SNR results in harmful overfitting. The paper attributes both types of overfitting to an approximate margin maximization property and shows that leaky ReLU networks trained with gradient descent satisfy this property. Unlike prior work, the authors do not require training data to be nearly orthogonal. They demonstrate that for input dimension d and training sample size n, benign overfitting occurs when d is proportional to n, unlike prior results which required a much stronger condition (d proportional to n^2 log n). |
Low | GrooveSquid.com (original content) | Low Difficulty Summary The paper looks at something called “benign overfitting” in computers. It’s like when you try to draw a picture and it’s perfect, but only because you copied what someone else drew. The researchers are trying to figure out why this happens and how we can avoid it. They’re looking at special kinds of computer models that use something called “leaky ReLU networks” to make predictions. They’re also using a special kind of math problem called the “hinge loss”. By studying this, they want to understand when these models get too good at fitting the data and start making poor predictions. The results show that if the model is very good at fitting the data, it’s more likely to be benign overfitting. If it’s not so good, then it might be bad overfitting. |
Keywords
* Artificial intelligence * Classification * Gradient descent * Hinge loss * Overfitting * Relu