Summary of Test-time Backdoor Attacks on Multimodal Large Language Models, by Dong Lu et al.
Test-Time Backdoor Attacks on Multimodal Large Language Models
by Dong Lu, Tianyu Pang, Chao Du, Qian Liu, Xianjun Yang, Min Lin
First submitted to arxiv on: 13 Feb 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The proposed paper, AnyDoor, introduces a novel test-time backdoor attack method against multimodal large language models (MLLMs). The attack involves contaminating textual modality using adversarial test images sharing the same universal perturbation, without requiring access to or modification of training data. This medium-difficulty summary highlights how AnyDoor decouples timing of setup and activation of harmful effects, validating its effectiveness against popular MLLMs like LLaVA-1.5, MiniGPT-4, InstructBLIP, and BLIP-2 through ablation studies. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary AnyDoor is a new way to hack into big language models that can understand text and images. The bad guys can make the model do something harmful by adding a special trigger to the test images. This means they don’t need to change anything in how the model was trained. The researchers tested AnyDoor against some popular models and found it worked well. They also showed that this attack is hard to detect because the trigger prompt/harmful effect can be changed on the fly. |
Keywords
* Artificial intelligence * Prompt