Loading Now

Summary of Perturb and Recover: Fine-tuning For Effective Backdoor Removal From Clip, by Naman Deep Singh et al.


Perturb and Recover: Fine-tuning for Effective Backdoor Removal from CLIP

by Naman Deep Singh, Francesco Croce, Matthias Hein

First submitted to arxiv on: 1 Dec 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The paper investigates the vulnerability of Vision-Language models, such as CLIP, to backdoor attacks. These attacks manipulate the model’s training data to make it misclassify specific inputs. Since CLIP models are widely used and trained on web-derived image-text pairs, they become a prime target for malicious attacks. The authors focus on cleaning potentially poisoned models via fine-tuning, as retraining from scratch is impractical. They demonstrate that existing cleaning techniques are ineffective against structured triggers, highlighting a critical vulnerability. To address this issue, the paper introduces PAR (Perturb and Recover), a simple yet effective mechanism to remove backdoors from CLIP models. Extensive experiments across different encoders and attack types show that PAR achieves high backdoor removal rates while maintaining good performance. The approach is also demonstrated to be effective using synthetic text-image pairs without access to real training data.
Low GrooveSquid.com (original content) Low Difficulty Summary
Imagine a powerful computer program that can understand images and text. This program, called CLIP, is really good at linking what it sees in an image with words that describe it. But someone could secretly change the way this program works by manipulating its training data. This would allow the person to make the program misclassify specific pictures or texts. The authors of this paper want to stop this from happening by cleaning up potentially bad models. They show that existing methods don’t work and introduce a new method called PAR, which is surprisingly simple yet effective. With PAR, they can remove the “bad” parts from the model without losing its ability to understand images and text. This could help prevent malicious attacks on these powerful programs.

Keywords

» Artificial intelligence  » Fine tuning