Summary of Waka: Data Attribution Using K-nearest Neighbors and Membership Privacy Principles, by Patrick Mesana et al.
WaKA: Data Attribution using K-Nearest Neighbors and Membership Privacy Principles
by Patrick Mesana, Clément Bénesse, Hadrien Lautraite, Gilles Caporossi, Sébastien Gambs
First submitted to arxiv on: 2 Nov 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Cryptography and Security (cs.CR)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary A novel attribution method called WaKA is introduced in this paper, which combines principles from the LiRA framework and k-nearest neighbors classifiers. WaKA measures the contribution of individual data points to a model’s loss distribution by analyzing every possible k-NN constructed using the training set. This approach is versatile and can be used for both membership inference attacks (MIA) and privacy influence measurement. The paper demonstrates that WaKA provides a unified framework to distinguish between a data point’s value and its privacy risk, showing strong correlations with attack success rates. Additionally, WaKA outperforms Shapley Values in imbalanced datasets. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary WaKA is a new way to understand how individual pieces of data affect a model’s performance. It uses ideas from two other techniques: LiRA and k-nearest neighbors. WaKA looks at every possible combination of training data points to figure out how each one contributes to the model’s loss. This helps us understand both what makes a piece of data valuable and what makes it risky for privacy. The researchers tested WaKA on many different datasets and found that it works well, even when dealing with tricky imbalanced datasets. |
Keywords
* Artificial intelligence * Inference