Loading Now

Summary of Adaptive Discounting Of Training Time Attacks, by Ridhima Bector et al.


Adaptive Discounting of Training Time Attacks

by Ridhima Bector, Abhay Aradhya, Chai Quek, Zinovi Rabinovich

First submitted to arxiv on: 5 Jan 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
Reinforcement Learning (RL) solutions are vulnerable to training-time attacks (TTAs), which create backdoors and loopholes in the learned behavior. Constructive TTAs (C-TTAs) can force a specific target behavior on an RL agent, but existing approaches focus on behaviors that could be naturally adopted by the victim if not for environmental dynamics. This work demonstrates a C-TTA that targets unadoptable behaviors due to both environment and non-optimality with respect to the victim’s objectives. The authors develop gammaDDPG, a variant of the DDPG algorithm that learns this stronger version of C-TTA. gammaDDPG adjusts its attack policy planning horizon based on the victim’s behavior, improving effort distribution and reducing uncertainty. Experiments are conducted in a 3D grid domain borrowed from a state-of-the-art C-TTA. Code is available at http://bit.ly/github-rb-gDDPG.
Low GrooveSquid.com (original content) Low Difficulty Summary
Reinforcement Learning (RL) is a way to train machines to make decisions. But, there’s a problem: someone can secretly alter the training process to make the machine behave in a certain way, even if that behavior doesn’t make sense. This kind of attack is called a constructive TTA (C-TTA). The authors of this paper have developed a new method for creating C-TTAs that works even when the target behavior doesn’t match what the machine would naturally do. They use an algorithm called gammaDDPG to create these attacks and test it in a special environment. This research can help improve RL systems by making them more secure.

Keywords

* Artificial intelligence  * Reinforcement learning