Loading Now

Summary of Weights Shuffling For Improving Dpsgd in Transformer-based Models, by Jungang Yang and Zhe Ji and Liyao Xiang


Weights Shuffling for Improving DPSGD in Transformer-based Models

by Jungang Yang, Zhe Ji, Liyao Xiang

First submitted to arxiv on: 22 Jul 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Cryptography and Security (cs.CR)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The innovative shuffling mechanism introduced in Differentially-Private Stochastic Gradient Descent (DPSGD) enhances the utility of large models while maintaining the same privacy guarantee as the unshuffled case. By exploiting permutation invariance, random shuffling brings additional randomness to the gradient descent trajectory without impacting model accuracy. Theoretical analysis shows that permutation improves the privacy guarantee, but estimating privacy loss becomes challenging. To overcome this, an approximation on sum of lognormal distributions is derived to meet the DP guarantee. Experimental results verify the theoretical derivation and demonstrate improved accuracy over state-of-the-art baselines on various models and tasks.
Low GrooveSquid.com (original content) Low Difficulty Summary
Differential Privacy (DP) helps keep personal data private. The problem is that keeping data safe can also make it less useful for learning. This paper solves this challenge by introducing a new way to train large models while keeping them private. They call it Differentially-Private Stochastic Gradient Descent (DPSGD). By shuffling the data, they add more randomness to the model training process without harming its accuracy. Theoretical analysis shows that this approach improves privacy protection, but tracking how much privacy is lost becomes tricky. To fix this, they develop an approximation method to estimate the privacy loss. Experimental results confirm their theory and show that their approach outperforms existing methods.

Keywords

* Artificial intelligence  * Gradient descent  * Stochastic gradient descent  * Tracking