Summary of Cleangen: Mitigating Backdoor Attacks For Generation Tasks in Large Language Models, by Yuetai Li et al.

CleanGen: Mitigating Backdoor Attacks for Generation Tasks in Large Language Models

by Yuetai Li, Zhangchen Xu, Fengqing Jiang, Luyao Niu, Dinuka Sahabandu, Bhaskar Ramasubramanian, Radha Poovendran

First submitted to arxiv on: 18 Jun 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The proposed defense mechanism, CLEANGEN, is a lightweight decoding strategy that can effectively mitigate backdoor attacks on large language models (LLMs) used for generation tasks. By exploiting the discrepancies in token probabilities between clean and compromised LLMs, CLEANGEN can identify suspicious tokens and replace them with ones generated by an unattacked model. This approach achieves lower attack success rates compared to five state-of-the-art baseline defenses against five backdoor attacks, while maintaining helpfulness in responses to benign user queries with minimal added computational overhead.
Low	GrooveSquid.com (original content)	Low Difficulty Summary A new defense mechanism called CLEANGEN is designed to prevent backdoor attacks on large language models that are used for tasks like chatbots and virtual assistants. These models can be trained or fine-tuned using publicly available data, but this makes them vulnerable to attacks that inject unwanted content. The idea behind CLEANGEN is that it can detect when a model is producing suspicious results and replace those results with ones from an unattacked model. This approach is effective in blocking backdoor attacks and does not significantly slow down the models.

Keywords

» Artificial intelligence » Token

CleanGen: Mitigating Backdoor Attacks for Generation Tasks in Large Language Models

by Yuetai Li, Zhangchen Xu, Fengqing Jiang, Luyao Niu, Dinuka Sahabandu, Bhaskar Ramasubramanian, Radha Poovendran

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Refine Large Language Model Fine-tuning Via Instruction Vector, by Gangwei Jiang et al.

Summary of Beyond Under-alignment: Atomic Preference Enhanced Factuality Tuning For Large Language Models, by Hongbang Yuan et al.

Related Posts