Summary of Mutual Information Guided Backdoor Mitigation For Pre-trained Encoders, by Tingxu Han et al.
Mutual Information Guided Backdoor Mitigation for Pre-trained Encoders
by Tingxu Han, Weisong Sun, Ziqi Ding, Chunrong Fang, Hanwei Qian, Jiaxun Li, Zhenyu Chen, Xiangyu Zhang
First submitted to arxiv on: 5 Jun 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary Medium Difficulty summary: Self-supervised learning (SSL) has emerged as a promising approach for pre-training encoders without labeled data. Downstream tasks leveraging these pre-trained encoders can achieve nearly state-of-the-art performance. However, SSL-pretrained encoders are vulnerable to backdoor attacks, which existing studies have demonstrated. To address this issue, this paper proposes MIMIC, a mutual information guided backdoor mitigation technique that treats the potentially backdoored encoder as the teacher net and employs knowledge distillation to distill a clean student encoder. Unlike traditional knowledge distillation approaches, MIMIC initializes the student with random weights, avoiding the transfer of backdoors from the teacher net. The proposed distillation loss combines clone loss and attention loss, aiming to mitigate backdoors while maintaining encoder performance. Evaluations conducted on two SSL-based backdoor attacks demonstrate that MIMIC can significantly reduce attack success rates using only 5% clean data, outperforming seven state-of-the-art backdoor mitigation techniques. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Low Difficulty summary: This paper is about a new way to make AI models more secure. Right now, it’s possible to trick these models into doing the wrong thing without their knowledge. The authors of this paper want to stop this from happening by using a special technique called MIMIC. MIMIC works by treating the potentially bad model as a teacher and creating a new, clean model that learns from it. This new model is designed to ignore any bad information it might get from the old model. The authors tested their idea on two different types of attacks and found that it worked really well. In fact, they were able to stop most of these attacks using only a small amount of good data. |
Keywords
» Artificial intelligence » Attention » Distillation » Encoder » Knowledge distillation » Self supervised