Summary of Adversarial Attacks on Data Attribution, by Xinhe Wang et al.

Adversarial Attacks on Data Attribution

by Xinhe Wang, Pingbang Hu, Junwei Deng, Jiaqi W. Ma

First submitted to arxiv on: 9 Sep 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary This paper addresses the critical issue of adversarial robustness in data attribution methods, which quantify an AI model’s reliance on individual training data points. Data attribution has significant implications for financial decisions and compensation mechanisms, making it crucial to evaluate its resilience against malicious attacks. The authors propose two principled attack methods: Shadow Attack and Outlier Attack. The former leverages knowledge about the target model’s applications and derives perturbations through “shadow training”, while the latter relies solely on black-box queries to inflate data attribution-based compensation. Empirical results demonstrate that both attacks can significantly increase compensation, with the Shadow Attack achieving at least 200% inflation in image classification and text generation tasks.
Low	GrooveSquid.com (original content)	Low Difficulty Summary Imagine you’re trying to figure out how much someone should be paid for helping an AI learn. The problem is that some people might try to cheat by making fake data look important. This paper tries to solve this problem by creating two new ways to detect when someone is trying to manipulate the system. One method, called Shadow Attack, uses information about what kind of data the AI likes to make it harder for cheaters. The other method, Outlier Attack, just looks at how the AI responds to different data points and finds a way to make fake data look important too. When tested, both methods were able to increase the amount someone should be paid by a lot – over 100% in some cases.

Keywords

* Artificial intelligence * Image classification * Text generation

Adversarial Attacks on Data Attribution

by Xinhe Wang, Pingbang Hu, Junwei Deng, Jiaqi W. Ma

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Interactive Incremental Learning Of Generalizable Skills with Local Trajectory Modulation, by Markus Knauer et al.

Summary of A Dual-path Neural Network Model to Construct the Flame Nonlinear Thermoacoustic Response in the Time Domain, by Jiawei Wu et al.

Related Posts