Summary of Adversarial Attacks on Data Attribution, by Xinhe Wang et al.
Adversarial Attacks on Data Attribution
by Xinhe Wang, Pingbang Hu, Junwei Deng, Jiaqi W. Ma
First submitted to arxiv on: 9 Sep 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: None
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper addresses the critical issue of adversarial robustness in data attribution methods, which quantify an AI model’s reliance on individual training data points. Data attribution has significant implications for financial decisions and compensation mechanisms, making it crucial to evaluate its resilience against malicious attacks. The authors propose two principled attack methods: Shadow Attack and Outlier Attack. The former leverages knowledge about the target model’s applications and derives perturbations through “shadow training”, while the latter relies solely on black-box queries to inflate data attribution-based compensation. Empirical results demonstrate that both attacks can significantly increase compensation, with the Shadow Attack achieving at least 200% inflation in image classification and text generation tasks. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Imagine you’re trying to figure out how much someone should be paid for helping an AI learn. The problem is that some people might try to cheat by making fake data look important. This paper tries to solve this problem by creating two new ways to detect when someone is trying to manipulate the system. One method, called Shadow Attack, uses information about what kind of data the AI likes to make it harder for cheaters. The other method, Outlier Attack, just looks at how the AI responds to different data points and finds a way to make fake data look important too. When tested, both methods were able to increase the amount someone should be paid by a lot – over 100% in some cases. |
Keywords
» Artificial intelligence » Image classification » Text generation