Summary of Cross-modal Safety Alignment: Is Textual Unlearning All You Need?, by Trishna Chakraborty et al.
Cross-Modal Safety Alignment: Is textual unlearning all you need?
by Trishna Chakraborty, Erfan Shayegani, Zikui Cai, Nael Abu-Ghazaleh, M. Salman Asif, Yue Dong, Amit K. Roy-Chowdhury, Chengyu Song
First submitted to arxiv on: 27 May 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This abstract proposes a novel approach to ensuring safety in Vision-Language Models (VLMs) by exploring whether unlearning solely in the textual domain can be effective for cross-modality safety alignment. The authors aim to reduce the Attack Success Rate (ASR) and preserve model utility, demonstrating empirically that textual unlearning in VLMs significantly reduces ASR across six datasets, from nearly 2% to less than 8%. Additionally, they show that using a multi-modal dataset offers no benefits but incurs increased computational demands. The study highlights the importance of considering new modalities when integrating them into Large Language Models (LLMs) and underscores the need for developing effective safety training techniques in this context. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper shows how to make language models safer by using just text information. It tries to figure out if making a model forget its text-based knowledge can help it be safer with images too. The researchers test their idea on six different datasets and find that it works well, reducing the chances of an attack from less than 8% to as low as 2%. They also show that using images in addition to text doesn’t make the model any better at being safe, but does make it take longer to process information. |
Keywords
» Artificial intelligence » Alignment » Multi modal