Summary of Semantic Sensitivities and Inconsistent Predictions: Measuring the Fragility Of Nli Models, by Erik Arakelyan et al.

Semantic Sensitivities and Inconsistent Predictions: Measuring the Fragility of NLI Models

by Erik Arakelyan, Zhaoqi Liu, Isabelle Augenstein

First submitted to arxiv on: 25 Jan 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary This paper investigates the capabilities of transformer-based Natural Language Understanding (NLU) models, specifically their understanding of lexical and compositional semantics. While previous studies have claimed that these models possess an understanding of such semantics, the authors argue that this claim should be taken with caution. They find that state-of-the-art NLI models are sensitive to minor semantics-preserving surface-form variations, leading to inconsistent model decisions during inference. This sensitivity is distinct from valid comprehension of compositional semantics and does not emerge when evaluating model accuracy on standard benchmarks or probing for syntactic, monotonic, and logically robust reasoning. The authors propose a novel framework to measure the extent of semantic sensitivity and evaluate NLI models on adversarially generated examples containing minor semantics-preserving surface-form input noise. Their experiments show that semantic sensitivity causes performance degradations of 12.92% and 23.71% average over in- and out-of-domain settings, respectively. The authors also perform ablation studies to analyze this phenomenon across models, datasets, and variations in inference.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This paper shows that some AI language models are not as good at understanding natural language as we thought. These models are very good at doing things like answering questions or completing tasks, but they’re actually quite bad at understanding the meaning of sentences. The authors did some clever experiments to show this and found that these models get confused if you change a sentence slightly. They also showed that this problem gets worse when the model is trying to do something new, rather than just doing what it was trained for.

Keywords

* Artificial intelligence * Inference * Language understanding * Semantics * Transformer

Semantic Sensitivities and Inconsistent Predictions: Measuring the Fragility of NLI Models

by Erik Arakelyan, Zhaoqi Liu, Isabelle Augenstein

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Fuzzy Logic Function As a Post-hoc Explanator Of the Nonlinear Classifier, by Martin Klimo et al.

Summary of Comparison Of Reservoir Computing Topologies Using the Recurrent Kernel Approach, by Giuseppe Alessio D’inverno et al.

Related Posts