Summary of Cross-modal Safety Alignment: Is Textual Unlearning All You Need?, by Trishna Chakraborty et al.

by Trishna Chakraborty, Erfan Shayegani, Zikui Cai, Nael Abu-Ghazaleh, M. Salman Asif, Yue Dong, Amit K. Roy-Chowdhury, Chengyu Song

First submitted to arxiv on: 27 May 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary This abstract proposes a novel approach to ensuring safety in Vision-Language Models (VLMs) by exploring whether unlearning solely in the textual domain can be effective for cross-modality safety alignment. The authors aim to reduce the Attack Success Rate (ASR) and preserve model utility, demonstrating empirically that textual unlearning in VLMs significantly reduces ASR across six datasets, from nearly 2% to less than 8%. Additionally, they show that using a multi-modal dataset offers no benefits but incurs increased computational demands. The study highlights the importance of considering new modalities when integrating them into Large Language Models (LLMs) and underscores the need for developing effective safety training techniques in this context.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This paper shows how to make language models safer by using just text information. It tries to figure out if making a model forget its text-based knowledge can help it be safer with images too. The researchers test their idea on six different datasets and find that it works well, reducing the chances of an attack from less than 8% to as low as 2%. They also show that using images in addition to text doesn’t make the model any better at being safe, but does make it take longer to process information.

Keywords

» Artificial intelligence » Alignment » Multi modal

Cross-Modal Safety Alignment: Is textual unlearning all you need?

by Trishna Chakraborty, Erfan Shayegani, Zikui Cai, Nael Abu-Ghazaleh, M. Salman Asif, Yue Dong, Amit K. Roy-Chowdhury, Chengyu Song

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Landscape-aware Growing: the Power Of a Little Lag, by Stefani Karp et al.

Summary of Exploring the Potential Of Polynomial Basis Functions in Kolmogorov-arnold Networks: a Comparative Study Of Different Groups Of Polynomials, by Seyd Teymoor Seydi

Related Posts