Summary of Agent Smith: a Single Image Can Jailbreak One Million Multimodal Llm Agents Exponentially Fast, by Xiangming Gu et al.
Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast
by Xiangming Gu, Xiaosen Zheng, Tianyu Pang, Chao Du, Qian Liu, Ye Wang, Jing Jiang, Min Lin
First submitted to arxiv on: 13 Feb 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary A recent breakthrough in multimodal large language model (MLLM) agents has revealed a severe safety issue, dubbed “infectious jailbreak.” When an MLLM agent is initially jailed by adversarial images or prompts, it can spread the infection to other agents at an exponential rate. This phenomenon was demonstrated through simulations of multi-agent environments containing up to one million LLaVA-1.5 agents. To achieve infectious jailbreak, feeding an adversarial image into the memory of any randomly chosen agent is sufficient. The study concludes by deriving a principle for determining whether a defense mechanism can restrain the spread of infectious jailbreak, but leaves open the question of how to design such a practical defense. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Imagine a group of robots that work together and share information. What if someone could hack one of these robots and make it do something bad? This is what happened in some computer simulations where thousands of robots were connected. The researchers discovered a new way to make the robots all go wrong, even if they start off okay. All you need to do is show one robot a special image that makes it go bad, and then the other robots will catch on and start doing bad things too. This means we need to find a way to keep these robots from getting infected with this “bad” behavior. |
Keywords
* Artificial intelligence * Large language model