Summary of Cleangen: Mitigating Backdoor Attacks For Generation Tasks in Large Language Models, by Yuetai Li et al.
CleanGen: Mitigating Backdoor Attacks for Generation Tasks in Large Language Models
by Yuetai Li, Zhangchen Xu, Fengqing Jiang, Luyao Niu, Dinuka Sahabandu, Bhaskar Ramasubramanian, Radha Poovendran
First submitted to arxiv on: 18 Jun 2024
Categories
- Main: Artificial Intelligence (cs.AI)
- Secondary: Cryptography and Security (cs.CR)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The proposed defense mechanism, CLEANGEN, is a lightweight decoding strategy that can effectively mitigate backdoor attacks on large language models (LLMs) used for generation tasks. By exploiting the discrepancies in token probabilities between clean and compromised LLMs, CLEANGEN can identify suspicious tokens and replace them with ones generated by an unattacked model. This approach achieves lower attack success rates compared to five state-of-the-art baseline defenses against five backdoor attacks, while maintaining helpfulness in responses to benign user queries with minimal added computational overhead. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary A new defense mechanism called CLEANGEN is designed to prevent backdoor attacks on large language models that are used for tasks like chatbots and virtual assistants. These models can be trained or fine-tuned using publicly available data, but this makes them vulnerable to attacks that inject unwanted content. The idea behind CLEANGEN is that it can detect when a model is producing suspicious results and replace those results with ones from an unattacked model. This approach is effective in blocking backdoor attacks and does not significantly slow down the models. |
Keywords
» Artificial intelligence » Token