Loading Now

Summary of Adversarial Robustness Of In-context Learning in Transformers For Linear Regression, by Usman Anwar et al.


Adversarial Robustness of In-Context Learning in Transformers for Linear Regression

by Usman Anwar, Johannes Von Oswald, Louis Kirsch, David Krueger, Spencer Frei

First submitted to arxiv on: 7 Nov 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Cryptography and Security (cs.CR)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
Transformers have shown impressive in-context learning capabilities across various domains, including statistical learning tasks. However, the adversarial robustness of these learned algorithms remains unexplored. This paper investigates the vulnerability of in-context learning in transformers to hijacking attacks, focusing on linear regression tasks. The authors first prove that single-layer linear transformers can be manipulated to output arbitrary predictions by perturbing a single example in the in-context training set. While this attack is successful on linear transformers, it does not transfer to more complex transformers with GPT-2 architectures. However, these transformers can be hijacked using gradient-based adversarial attacks. The authors demonstrate that adversarial training enhances transformers’ robustness against hijacking attacks, even when only applied during finetuning. Additionally, they find that in some settings, adversarial training against a weaker attack model can lead to robustness to a stronger attack model. Finally, the paper investigates the transferability of hijacking attacks across transformers of varying scales and initialization seeds, as well as between transformers and ordinary least squares (OLS). The results show that while attacks transfer effectively between small-scale transformers, they exhibit poor transferability in other scenarios.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper is about making sure that AI models called transformers are not tricked into giving the wrong answers. These models can learn from examples and make predictions. But what if someone tries to cheat by manipulating the examples? The authors tested this idea and found that simple transformers can be tricked easily, but more complex ones are harder to fool. They also showed that if you train these models in a way that makes them stronger against cheating attempts, they become much less vulnerable. This is important because it helps keep our AI systems honest and reliable.

Keywords

» Artificial intelligence  » Gpt  » Linear regression  » Transferability