Loading Now

Summary of Imgtrojan: Jailbreaking Vision-language Models with One Image, by Xijia Tao et al.


ImgTrojan: Jailbreaking Vision-Language Models with ONE Image

by Xijia Tao, Shuai Zhong, Lei Li, Qi Liu, Lingpeng Kong

First submitted to arxiv on: 5 Mar 2024

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The proposed jailbreaking attack targets vision language models (VLMs) to bypass their safety barriers when fed malicious instructions. By replacing original captions with poisoned prompts and introducing harmful image-text pairs during training, this method can successfully launch a jailbreak attack. The attack’s success rate is influenced by poison ratios and trainable parameter positions. To evaluate the attack’s efficacy, two metrics are designed: success rate and stealthiness. A benchmark is provided, including curated harmful instructions. This paper compares the proposed attack to baseline methods.
Low GrooveSquid.com (original content) Low Difficulty Summary
This research explores ways to make large language models (LLMs) and vision language models (VLMs) safer when used together. The problem is that these models can be tricked into doing bad things if someone gives them the wrong information. The authors propose a new way to do this, called a “jailbreak attack.” They show how to make it work by changing what the model sees and what it’s told about those images. They also measure how well their method works compared to other approaches.

Keywords

» Artificial intelligence