Summary of Agentpoison: Red-teaming Llm Agents Via Poisoning Memory or Knowledge Bases, by Zhaorun Chen et al.

AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases

by Zhaorun Chen, Zhen Xiang, Chaowei Xiao, Dawn Song, Bo Li

First submitted to arxiv on: 17 Jul 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The proposed AgentPoison red teaming approach targets generic and retrieval-augmented generation-based large language model (LLM) agents by poisoning their long-term memory or knowledge base, enabling backdoor attacks to manipulate the models’ behavior. By optimizing triggers through constrained optimization, the attack ensures high probability of retrieving malicious demonstrations when a user instruction contains the trigger. This novel approach requires no additional model training or fine-tuning and exhibits superior transferability, in-context coherence, and stealthiness. AgentPoison is demonstrated to be effective against three real-world LLM agents, achieving an average attack success rate higher than 80% with minimal impact on benign performance.
Low	GrooveSquid.com (original content)	Low Difficulty Summary AgentPoison is a new way to trick large language models by poisoning their memories or knowledge bases. This makes the models do things they weren’t supposed to do. The approach works by creating special triggers that make the models behave in a certain way when used. It’s clever because it doesn’t require any extra training and can be very sneaky. Researchers tested this attack on three real-life language models and found that it was very effective, making them do things they weren’t supposed to do over 80% of the time.

Keywords

* Artificial intelligence * Fine tuning * Knowledge base * Large language model * Optimization * Probability * Retrieval augmented generation * Transferability

AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases

by Zhaorun Chen, Zhen Xiang, Chaowei Xiao, Dawn Song, Bo Li

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Visually Robust Adversarial Imitation Learning From Videos with Contrastive Learning, by Vittorio Giammarino et al.

Summary of Pqcache: Product Quantization-based Kvcache For Long Context Llm Inference, by Hailin Zhang et al.

Related Posts