Summary of Enhancing Learning with Label Differential Privacy by Vector Approximation, By Puning Zhao et al.
Enhancing Learning with Label Differential Privacy by Vector Approximation
by Puning Zhao, Rongfei Fan, Huiwen Wu, Qingming Li, Jiafei Wu, Zhe Liu
First submitted to arxiv on: 24 May 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: None
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper proposes a novel approach to label differential privacy (DP) that protects the privacy of labels in training datasets while keeping feature vectors public. Existing methods randomly flip labels and train a model to approximate privatized labels, but these approaches suffer from decreased performance as the number of classes increases. The proposed vector approximation method converts each label into a random vector with class conditional probabilities, retaining more information than scalar labels. A brief theoretical analysis shows that this approach only slightly decays in performance as the number of classes grows. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper helps keep our data private while still being able to learn from it. Right now, there are ways to protect label privacy by randomly changing them, but these methods don’t work well when there are many different labels. The new approach proposed in this paper is called vector approximation and it’s easy to use and doesn’t slow down computers too much. It works by turning each label into a random vector that has information about the class conditional probabilities. This means we keep more information than just the original label. Some math shows that this approach only gets worse as there are more labels, but the results on real-world data show it still works well. |