Summary of Imgtrojan: Jailbreaking Vision-language Models with One Image, by Xijia Tao et al.

ImgTrojan: Jailbreaking Vision-Language Models with ONE Image

by Xijia Tao, Shuai Zhong, Lei Li, Qi Liu, Lingpeng Kong

First submitted to arxiv on: 5 Mar 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The proposed jailbreaking attack targets vision language models (VLMs) to bypass their safety barriers when fed malicious instructions. By replacing original captions with poisoned prompts and introducing harmful image-text pairs during training, this method can successfully launch a jailbreak attack. The attack’s success rate is influenced by poison ratios and trainable parameter positions. To evaluate the attack’s efficacy, two metrics are designed: success rate and stealthiness. A benchmark is provided, including curated harmful instructions. This paper compares the proposed attack to baseline methods.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This research explores ways to make large language models (LLMs) and vision language models (VLMs) safer when used together. The problem is that these models can be tricked into doing bad things if someone gives them the wrong information. The authors propose a new way to do this, called a “jailbreak attack.” They show how to make it work by changing what the model sees and what it’s told about those images. They also measure how well their method works compared to other approaches.

Keywords

» Artificial intelligence

ImgTrojan: Jailbreaking Vision-Language Models with ONE Image

by Xijia Tao, Shuai Zhong, Lei Li, Qi Liu, Lingpeng Kong

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Minimum Topology Attacks For Graph Neural Networks, by Mengmei Zhang et al.

Summary of Localized Zeroth-order Prompt Optimization, by Wenyang Hu et al.

Related Posts