Summary of Navigating the Overkill in Large Language Models, by Chenyu Shi et al.

Navigating the OverKill in Large Language Models

by Chenyu Shi, Xiao Wang, Qiming Ge, Songyang Gao, Xianjun Yang, Tao Gui, Qi Zhang, Xuanjing Huang, Xun Zhao, Dahua Lin

First submitted to arxiv on: 31 Jan 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary This paper explores the phenomenon of large language models becoming overly cautious and refusing to answer benign queries, a potential overkill issue. Researchers investigate the factors contributing to this problem by examining how models handle and determine the safety of queries. The study reveals that shortcuts within models lead to an over-attention on harmful words like ‘kill’ and prompts emphasizing safety can exacerbate overkill. To alleviate this issue, the authors introduce Self-Contrastive Decoding (Self-CD), a training-free and model-agnostic strategy. This approach first extracts over-attention by amplifying differences in model output distributions when responding to system prompts with or without emphasis on safety. Then, it downplays the over-attention via contrastive decoding to determine final next-token predictions. The empirical results show that Self-CD achieves an average reduction of the refusal rate by 20% while having almost no impact on safety.
Low	GrooveSquid.com (original content)	Low Difficulty Summary Large language models are super smart and can help us with lots of things! But sometimes, they get too worried about saying something wrong and won’t answer simple questions. This paper figures out why this happens and comes up with a solution to make them less afraid. The main idea is that these models have shortcuts inside them that make them pay too much attention to certain words. This makes them more likely to say no to helpful answers. To fix this, the researchers create a new way to train the models, called Self-Contrastive Decoding (Self-CD). This method helps the models be less scared and answer questions correctly.

Keywords

* Artificial intelligence * Attention * Token

Navigating the OverKill in Large Language Models

by Chenyu Shi, Xiao Wang, Qiming Ge, Songyang Gao, Xianjun Yang, Tao Gui, Qi Zhang, Xuanjing Huang, Xun Zhao, Dahua Lin

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Good at Captioning, Bad at Counting: Benchmarking Gpt-4v on Earth Observation Data, by Chenhui Zhang et al.

Summary of Contextual Feature Extraction Hierarchies Converge in Large Language Models and the Brain, by Gavin Mischler et al.

Related Posts