Loading Now

Summary of Implicit Bias in Noisy-sgd: with Applications to Differentially Private Training, by Tom Sander et al.


Implicit Bias in Noisy-SGD: With Applications to Differentially Private Training

by Tom Sander, Maxime Sylvestre, Alain Durmus

First submitted to arxiv on: 13 Feb 2024

Categories

  • Main: Machine Learning (stat.ML)
  • Secondary: Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The abstract discusses how training deep neural networks (DNNs) with small batches using stochastic gradient descent (SGD) results in better test performance compared to larger batches. The authors attribute this implicit bias to the specific noise structure inherent to SGD. They also explore the effect of differential privacy (DP) on DNN training, specifically DP-SGD, which adds Gaussian noise to clipped gradients. Surprisingly, large-batch training still leads to a significant decrease in performance, posing a challenge for strong DP guarantees that require massive batches. The authors analyze Noisy-SGD (DP-SGD without clipping) and find that the stochasticity, rather than the clipping, is responsible for the implicit bias. They also theoretically analyze continuous versions of Noisy-SGD for linear least square and diagonal linear networks, revealing that the implicit bias is amplified by additional noise. This research offers hope for improving large batch training strategies.
Low GrooveSquid.com (original content) Low Difficulty Summary
Training deep neural networks (DNNs) with small batches helps them perform better on tests than larger batches do. This is because of the way SGD works. The authors also talk about DP-SGD, which adds noise to DNN training to keep it private. But they found that even with this added noise, large batches still don’t work well. They think this might be a problem for keeping DNNs private.

Keywords

* Artificial intelligence  * Stochastic gradient descent