Summary of Differentially Private Knowledge Distillation Via Synthetic Text Generation, by James Flemings and Murali Annavaram
Differentially Private Knowledge Distillation via Synthetic Text Generation
by James Flemings, Murali Annavaram
First submitted to arxiv on: 1 Mar 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Computation and Language (cs.CL); Cryptography and Security (cs.CR)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The proposed DistilDP algorithm combines differential privacy and model compression to train large language models (LLMs) while preserving data privacy and utility. The approach leverages synthetic data generated by a differentially private teacher LLM to transfer knowledge to the student model, using both hard labels from the synthetic data and soft labels based on the output distribution of the teacher. Additionally, aligning hidden representations between the teacher and student models can further improve knowledge distillation. Experimental results show that DistilDP achieves at least 9.0 PPL on the Big Patent dataset with strong privacy parameters (epsilon=2), outperforming existing baselines. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Large language models are getting better at many tasks, but we need to protect people’s private data. We also want these models to be smaller and faster so they can run on devices or in places where computers are slow. To do this, we have two ways to make models better: differential privacy and model compression. When we use both methods together, it can make the models worse. So, we came up with a new way called DistilDP that combines these two ideas. It uses special data made by a private teacher model to teach another model, making it better at doing tasks. |
Keywords
* Artificial intelligence * Knowledge distillation * Model compression * Student model * Synthetic data * Teacher model