Summary of Adaptive Discounting Of Training Time Attacks, by Ridhima Bector et al.

Adaptive Discounting of Training Time Attacks

by Ridhima Bector, Abhay Aradhya, Chai Quek, Zinovi Rabinovich

First submitted to arxiv on: 5 Jan 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary Reinforcement Learning (RL) solutions are vulnerable to training-time attacks (TTAs), which create backdoors and loopholes in the learned behavior. Constructive TTAs (C-TTAs) can force a specific target behavior on an RL agent, but existing approaches focus on behaviors that could be naturally adopted by the victim if not for environmental dynamics. This work demonstrates a C-TTA that targets unadoptable behaviors due to both environment and non-optimality with respect to the victim’s objectives. The authors develop gammaDDPG, a variant of the DDPG algorithm that learns this stronger version of C-TTA. gammaDDPG adjusts its attack policy planning horizon based on the victim’s behavior, improving effort distribution and reducing uncertainty. Experiments are conducted in a 3D grid domain borrowed from a state-of-the-art C-TTA. Code is available at http://bit.ly/github-rb-gDDPG.
Low	GrooveSquid.com (original content)	Low Difficulty Summary Reinforcement Learning (RL) is a way to train machines to make decisions. But, there’s a problem: someone can secretly alter the training process to make the machine behave in a certain way, even if that behavior doesn’t make sense. This kind of attack is called a constructive TTA (C-TTA). The authors of this paper have developed a new method for creating C-TTAs that works even when the target behavior doesn’t match what the machine would naturally do. They use an algorithm called gammaDDPG to create these attacks and test it in a special environment. This research can help improve RL systems by making them more secure.

Keywords

* Artificial intelligence * Reinforcement learning

Adaptive Discounting of Training Time Attacks

by Ridhima Bector, Abhay Aradhya, Chai Quek, Zinovi Rabinovich

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Guaranteed Nonconvex Factorization Approach For Tensor Train Recovery, by Zhen Qin et al.

Summary of Fairness-aware Job Scheduling For Multi-job Federated Learning, by Yuxin Shi et al.

Related Posts