Summary of Best-of-n Jailbreaking, by John Hughes et al.

Best-of-N Jailbreaking

by John Hughes, Sara Price, Aengus Lynch, Rylan Schaeffer, Fazl Barez, Sanmi Koyejo, Henry Sleight, Erik Jones, Ethan Perez, Mrinank Sharma

First submitted to arxiv on: 4 Dec 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The proposed Best-of-N (BoN) Jailbreaking algorithm is a simple black-box method that successfully attacks frontier AI systems across various modalities. By repeatedly sampling variations of prompts with augmentations such as random shuffling or capitalization, BoN achieves high attack success rates on closed-source language models like GPT-4o and Claude 3.5 Sonnet. The algorithm also circumvents state-of-the-art open-source defenses like circuit breakers and extends to other modalities like vision and audio language models. Furthermore, the attack’s effectiveness improves with more sampled prompts, following power-law-like behavior. BoN can be combined with other black-box algorithms for even more effective attacks.
Low	GrooveSquid.com (original content)	Low Difficulty Summary Best-of-N (BoN) Jailbreaking is a new way to test AI systems. It works by changing small parts of what we ask an AI system and seeing if it does something bad. This method is very good at breaking closed-source language models like GPT-4o and Claude 3.5 Sonnet, with success rates as high as 89% and 78%, respectively. BoN also works well against open-source defenses and can be used to test other AI systems that understand pictures or sound. The more times we try this method, the better it gets at breaking the AI system.

Keywords

* Artificial intelligence * Claude * Gpt

Best-of-N Jailbreaking

by John Hughes, Sara Price, Aengus Lynch, Rylan Schaeffer, Fazl Barez, Sanmi Koyejo, Henry Sleight, Erik Jones, Ethan Perez, Mrinank Sharma

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Beyond Algorithm Hyperparameters: on Preprocessing Hyperparameters and Associated Pitfalls in Machine Learning Applications, by Christina Sauer et al.

Summary of Hyperparameter Tuning Through Pessimistic Bilevel Optimization, by Meltem Apaydin Ustun et al.

Related Posts