Summary of Revisiting Character-level Adversarial Attacks For Language Models, by Elias Abad Rocamora and Yongtao Wu and Fanghui Liu and Grigorios G. Chrysos and Volkan Cevher
Revisiting Character-level Adversarial Attacks for Language Models
by Elias Abad Rocamora, Yongtao Wu, Fanghui Liu, Grigorios G. Chrysos, Volkan Cevher
First submitted to arxiv on: 7 May 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (stat.ML)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary A novel adversarial attack, Charmer, is introduced to exploit vulnerabilities in Natural Language Processing (NLP) models. Unlike token-level attacks, which alter sentence semantics and are easily defended against, Charmer targets both small (BERT) and large (Llama 2) models by querying character-level perturbations. This query-based approach achieves high attack success rates (ASRs) while generating highly similar adversarial examples. Experimental results on the SST-2 dataset show a significant improvement in ASR (4.84% points) and USE similarity (8% points) compared to previous methods. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Adversarial attacks are sneaky ways to trick language models like BERT or Llama 2 into making mistakes. Researchers have been experimenting with different types of attacks, but some have been easier to defend against than others. A new attack called Charmer is designed to be particularly good at getting around these defenses and fooling the models. It does this by making small changes to individual characters in a sentence, rather than changing entire words or sentences. This approach seems to work well for both smaller and larger language models. |
Keywords
» Artificial intelligence » Bert » Llama » Natural language processing » Nlp » Semantics » Token