Loading Now

Summary of Cleangen: Mitigating Backdoor Attacks For Generation Tasks in Large Language Models, by Yuetai Li et al.


CleanGen: Mitigating Backdoor Attacks for Generation Tasks in Large Language Models

by Yuetai Li, Zhangchen Xu, Fengqing Jiang, Luyao Niu, Dinuka Sahabandu, Bhaskar Ramasubramanian, Radha Poovendran

First submitted to arxiv on: 18 Jun 2024

Categories

  • Main: Artificial Intelligence (cs.AI)
  • Secondary: Cryptography and Security (cs.CR)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The proposed defense mechanism, CLEANGEN, is a lightweight decoding strategy that can effectively mitigate backdoor attacks on large language models (LLMs) used for generation tasks. By exploiting the discrepancies in token probabilities between clean and compromised LLMs, CLEANGEN can identify suspicious tokens and replace them with ones generated by an unattacked model. This approach achieves lower attack success rates compared to five state-of-the-art baseline defenses against five backdoor attacks, while maintaining helpfulness in responses to benign user queries with minimal added computational overhead.
Low GrooveSquid.com (original content) Low Difficulty Summary
A new defense mechanism called CLEANGEN is designed to prevent backdoor attacks on large language models that are used for tasks like chatbots and virtual assistants. These models can be trained or fine-tuned using publicly available data, but this makes them vulnerable to attacks that inject unwanted content. The idea behind CLEANGEN is that it can detect when a model is producing suspicious results and replace those results with ones from an unattacked model. This approach is effective in blocking backdoor attacks and does not significantly slow down the models.

Keywords

» Artificial intelligence  » Token