Summary of Safety Fine-tuning at (almost) No Cost: a Baseline For Vision Large Language Models, by Yongshuo Zong et al.

Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models

by Yongshuo Zong, Ondrej Bohdal, Tingyang Yu, Yongxin Yang, Timothy Hospedales

First submitted to arxiv on: 3 Feb 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary This paper proposes a novel approach to addressing the issue of harmful content generation and vulnerability to attacks in current vision large language models (VLLMs). The authors identify that VLLM fine-tuning can cause forgetting of safety alignment previously learned by the underpinning LLM, which is detrimental to their overall performance. To mitigate this problem, the researchers curate a vision-language safe instruction-following dataset called VLGuard, covering various harmful categories. They then demonstrate that integrating this dataset into standard vision-language fine-tuning or utilizing it for post-hoc fine-tuning effectively safety aligns VLLMs with minimal impact on their helpfulness. The authors also provide empirical results showing that fine-tuned VLLMs reject unsafe instructions and substantially reduce the success rates of several black-box adversarial attacks.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This paper helps make computer programs called vision large language models (VLLMs) safer. Right now, these models can create bad content and are easily tricked into doing things they shouldn’t do. The problem is that when we teach VLLMs new skills, they forget some important rules that keep them safe. To fix this, the researchers created a special dataset called VLGuard that teaches VLLMs to follow good instructions and ignore bad ones. They show that using this dataset makes VLLMs much safer without making them worse at doing helpful things.

Keywords

* Artificial intelligence * Alignment * Fine tuning

Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models

by Yongshuo Zong, Ondrej Bohdal, Tingyang Yu, Yongxin Yang, Timothy Hospedales

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Learning Structure-aware Representations Of Dependent Types, by Konstantinos Kogkalidis et al.

Summary of Vanilla Bayesian Optimization Performs Great in High Dimensions, by Carl Hvarfner and Erik Orm Hellsten and Luigi Nardi

Related Posts