Summary of Revisiting Character-level Adversarial Attacks For Language Models, by Elias Abad Rocamora and Yongtao Wu and Fanghui Liu and Grigorios G. Chrysos and Volkan Cevher

Revisiting Character-level Adversarial Attacks for Language Models

by Elias Abad Rocamora, Yongtao Wu, Fanghui Liu, Grigorios G. Chrysos, Volkan Cevher

First submitted to arxiv on: 7 May 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary A novel adversarial attack, Charmer, is introduced to exploit vulnerabilities in Natural Language Processing (NLP) models. Unlike token-level attacks, which alter sentence semantics and are easily defended against, Charmer targets both small (BERT) and large (Llama 2) models by querying character-level perturbations. This query-based approach achieves high attack success rates (ASRs) while generating highly similar adversarial examples. Experimental results on the SST-2 dataset show a significant improvement in ASR (4.84% points) and USE similarity (8% points) compared to previous methods.
Low	GrooveSquid.com (original content)	Low Difficulty Summary Adversarial attacks are sneaky ways to trick language models like BERT or Llama 2 into making mistakes. Researchers have been experimenting with different types of attacks, but some have been easier to defend against than others. A new attack called Charmer is designed to be particularly good at getting around these defenses and fooling the models. It does this by making small changes to individual characters in a sentence, rather than changing entire words or sentences. This approach seems to work well for both smaller and larger language models.

Keywords

» Artificial intelligence » Bert » Llama » Natural language processing » Nlp » Semantics » Token

Revisiting Character-level Adversarial Attacks for Language Models

by Elias Abad Rocamora, Yongtao Wu, Fanghui Liu, Grigorios G. Chrysos, Volkan Cevher

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Vaeneu: a New Avenue For Vae Application on Probabilistic Forecasting, by Alireza Koochali et al.

Summary of Qserve: W4a8kv4 Quantization and System Co-design For Efficient Llm Serving, by Yujun Lin and Haotian Tang and Shang Yang and Zhekai Zhang and Guangxuan Xiao and Chuang Gan and Song Han

Related Posts