Loading Now

Summary of Revisiting Character-level Adversarial Attacks For Language Models, by Elias Abad Rocamora and Yongtao Wu and Fanghui Liu and Grigorios G. Chrysos and Volkan Cevher


Revisiting Character-level Adversarial Attacks for Language Models

by Elias Abad Rocamora, Yongtao Wu, Fanghui Liu, Grigorios G. Chrysos, Volkan Cevher

First submitted to arxiv on: 7 May 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (stat.ML)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
A novel adversarial attack, Charmer, is introduced to exploit vulnerabilities in Natural Language Processing (NLP) models. Unlike token-level attacks, which alter sentence semantics and are easily defended against, Charmer targets both small (BERT) and large (Llama 2) models by querying character-level perturbations. This query-based approach achieves high attack success rates (ASRs) while generating highly similar adversarial examples. Experimental results on the SST-2 dataset show a significant improvement in ASR (4.84% points) and USE similarity (8% points) compared to previous methods.
Low GrooveSquid.com (original content) Low Difficulty Summary
Adversarial attacks are sneaky ways to trick language models like BERT or Llama 2 into making mistakes. Researchers have been experimenting with different types of attacks, but some have been easier to defend against than others. A new attack called Charmer is designed to be particularly good at getting around these defenses and fooling the models. It does this by making small changes to individual characters in a sentence, rather than changing entire words or sentences. This approach seems to work well for both smaller and larger language models.

Keywords

» Artificial intelligence  » Bert  » Llama  » Natural language processing  » Nlp  » Semantics  » Token