Summary of Implicit Bias in Noisy-sgd: with Applications to Differentially Private Training, by Tom Sander et al.

Implicit Bias in Noisy-SGD: With Applications to Differentially Private Training

by Tom Sander, Maxime Sylvestre, Alain Durmus

First submitted to arxiv on: 13 Feb 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The abstract discusses how training deep neural networks (DNNs) with small batches using stochastic gradient descent (SGD) results in better test performance compared to larger batches. The authors attribute this implicit bias to the specific noise structure inherent to SGD. They also explore the effect of differential privacy (DP) on DNN training, specifically DP-SGD, which adds Gaussian noise to clipped gradients. Surprisingly, large-batch training still leads to a significant decrease in performance, posing a challenge for strong DP guarantees that require massive batches. The authors analyze Noisy-SGD (DP-SGD without clipping) and find that the stochasticity, rather than the clipping, is responsible for the implicit bias. They also theoretically analyze continuous versions of Noisy-SGD for linear least square and diagonal linear networks, revealing that the implicit bias is amplified by additional noise. This research offers hope for improving large batch training strategies.
Low	GrooveSquid.com (original content)	Low Difficulty Summary Training deep neural networks (DNNs) with small batches helps them perform better on tests than larger batches do. This is because of the way SGD works. The authors also talk about DP-SGD, which adds noise to DNN training to keep it private. But they found that even with this added noise, large batches still don’t work well. They think this might be a problem for keeping DNNs private.

Keywords

* Artificial intelligence * Stochastic gradient descent

Implicit Bias in Noisy-SGD: With Applications to Differentially Private Training

by Tom Sander, Maxime Sylvestre, Alain Durmus

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Chatcell: Facilitating Single-cell Analysis with Natural Language, by Yin Fang et al.

Summary of Subgraphormer: Unifying Subgraph Gnns and Graph Transformers Via Graph Products, by Guy Bar-shalom et al.

Related Posts