Loading Now

Summary of The Art Of Saying No: Contextual Noncompliance in Language Models, by Faeze Brahman et al.


The Art of Saying No: Contextual Noncompliance in Language Models

by Faeze Brahman, Sachin Kumar, Vidhisha Balachandran, Pradeep Dasigi, Valentina Pyatkin, Abhilasha Ravichander, Sarah Wiegreffe, Nouha Dziri, Khyathi Chandu, Jack Hessel, Yulia Tsvetkov, Noah A. Smith, Yejin Choi, Hannaneh Hajishirzi

First submitted to arxiv on: 2 Jul 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper introduces a comprehensive taxonomy for contextual noncompliance in chat-based language models. The authors argue that existing work primarily focuses on refusing “unsafe” queries, but the scope of noncompliance should be broader. They propose a wide range of categories including incomplete, unsupported, indeterminate, and humanizing requests, in addition to unsafe requests. To test the noncompliance capabilities of language models, the authors develop an evaluation suite with 1000 prompts based on this taxonomy. The results show that most existing models have high compliance rates in certain previously understudied categories, with GPT-4 incorrectly complying with up to 30% of requests. To address these gaps, the authors explore different training strategies using a synthetically-generated training set of requests and expected noncompliant responses. Their experiments demonstrate that direct finetuning can lead to both over-refusal and a decline in general capabilities, while parameter-efficient methods like low-rank adapters help strike a good balance between appropriate noncompliance and other capabilities.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper is about making language models more helpful by learning when they should say no to user requests. Right now, most models are designed to help users as much as possible, but this can sometimes lead to problems if the model is not careful enough. The authors of this paper want to change that by introducing a new way of thinking about when language models should refuse user requests. They propose five different categories of requests that models should refuse: incomplete, unsupported, indeterminate, humanizing, and unsafe. To test their idea, they create a set of 1000 prompts for language models to practice saying no. The results show that most existing models are too eager to help and need to learn when it’s okay to say no. The authors also explore different ways to train language models to be more careful about when they should refuse requests.

Keywords

» Artificial intelligence  » Gpt  » Parameter efficient