Summary of Weights Shuffling For Improving Dpsgd in Transformer-based Models, by Jungang Yang and Zhe Ji and Liyao Xiang

Weights Shuffling for Improving DPSGD in Transformer-based Models

by Jungang Yang, Zhe Ji, Liyao Xiang

First submitted to arxiv on: 22 Jul 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The innovative shuffling mechanism introduced in Differentially-Private Stochastic Gradient Descent (DPSGD) enhances the utility of large models while maintaining the same privacy guarantee as the unshuffled case. By exploiting permutation invariance, random shuffling brings additional randomness to the gradient descent trajectory without impacting model accuracy. Theoretical analysis shows that permutation improves the privacy guarantee, but estimating privacy loss becomes challenging. To overcome this, an approximation on sum of lognormal distributions is derived to meet the DP guarantee. Experimental results verify the theoretical derivation and demonstrate improved accuracy over state-of-the-art baselines on various models and tasks.
Low	GrooveSquid.com (original content)	Low Difficulty Summary Differential Privacy (DP) helps keep personal data private. The problem is that keeping data safe can also make it less useful for learning. This paper solves this challenge by introducing a new way to train large models while keeping them private. They call it Differentially-Private Stochastic Gradient Descent (DPSGD). By shuffling the data, they add more randomness to the model training process without harming its accuracy. Theoretical analysis shows that this approach improves privacy protection, but tracking how much privacy is lost becomes tricky. To fix this, they develop an approximation method to estimate the privacy loss. Experimental results confirm their theory and show that their approach outperforms existing methods.

Keywords

* Artificial intelligence * Gradient descent * Stochastic gradient descent * Tracking

Weights Shuffling for Improving DPSGD in Transformer-based Models

by Jungang Yang, Zhe Ji, Liyao Xiang

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Tackling Selfish Clients in Federated Learning, by Andrea Augello et al.

Summary of Planning in a Recurrent Neural Network That Plays Sokoban, by Mohammad Taufeeque et al.

Related Posts