Summary of Adversarial Robustness Of In-context Learning in Transformers For Linear Regression, by Usman Anwar et al.
Adversarial Robustness of In-Context Learning in Transformers for Linear Regression
by Usman Anwar, Johannes Von Oswald, Louis Kirsch, David Krueger, Spencer Frei
First submitted to arxiv on: 7 Nov 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Cryptography and Security (cs.CR)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary Transformers have shown impressive in-context learning capabilities across various domains, including statistical learning tasks. However, the adversarial robustness of these learned algorithms remains unexplored. This paper investigates the vulnerability of in-context learning in transformers to hijacking attacks, focusing on linear regression tasks. The authors first prove that single-layer linear transformers can be manipulated to output arbitrary predictions by perturbing a single example in the in-context training set. While this attack is successful on linear transformers, it does not transfer to more complex transformers with GPT-2 architectures. However, these transformers can be hijacked using gradient-based adversarial attacks. The authors demonstrate that adversarial training enhances transformers’ robustness against hijacking attacks, even when only applied during finetuning. Additionally, they find that in some settings, adversarial training against a weaker attack model can lead to robustness to a stronger attack model. Finally, the paper investigates the transferability of hijacking attacks across transformers of varying scales and initialization seeds, as well as between transformers and ordinary least squares (OLS). The results show that while attacks transfer effectively between small-scale transformers, they exhibit poor transferability in other scenarios. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper is about making sure that AI models called transformers are not tricked into giving the wrong answers. These models can learn from examples and make predictions. But what if someone tries to cheat by manipulating the examples? The authors tested this idea and found that simple transformers can be tricked easily, but more complex ones are harder to fool. They also showed that if you train these models in a way that makes them stronger against cheating attempts, they become much less vulnerable. This is important because it helps keep our AI systems honest and reliable. |
Keywords
» Artificial intelligence » Gpt » Linear regression » Transferability