Summary of Differentially Private Knowledge Distillation Via Synthetic Text Generation, by James Flemings and Murali Annavaram

Differentially Private Knowledge Distillation via Synthetic Text Generation

by James Flemings, Murali Annavaram

First submitted to arxiv on: 1 Mar 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The proposed DistilDP algorithm combines differential privacy and model compression to train large language models (LLMs) while preserving data privacy and utility. The approach leverages synthetic data generated by a differentially private teacher LLM to transfer knowledge to the student model, using both hard labels from the synthetic data and soft labels based on the output distribution of the teacher. Additionally, aligning hidden representations between the teacher and student models can further improve knowledge distillation. Experimental results show that DistilDP achieves at least 9.0 PPL on the Big Patent dataset with strong privacy parameters (epsilon=2), outperforming existing baselines.
Low	GrooveSquid.com (original content)	Low Difficulty Summary Large language models are getting better at many tasks, but we need to protect people’s private data. We also want these models to be smaller and faster so they can run on devices or in places where computers are slow. To do this, we have two ways to make models better: differential privacy and model compression. When we use both methods together, it can make the models worse. So, we came up with a new way called DistilDP that combines these two ideas. It uses special data made by a private teacher model to teach another model, making it better at doing tasks.

Keywords

* Artificial intelligence * Knowledge distillation * Model compression * Student model * Synthetic data * Teacher model

Differentially Private Knowledge Distillation via Synthetic Text Generation

by James Flemings, Murali Annavaram

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Resilience Of Entropy Model in Distributed Neural Networks, by Milin Zhang et al.

Summary of Tree-regularized Tabular Embeddings, by Xuan Li et al.

Related Posts