Summary of Semantic Sensitivities and Inconsistent Predictions: Measuring the Fragility Of Nli Models, by Erik Arakelyan et al.
Semantic Sensitivities and Inconsistent Predictions: Measuring the Fragility of NLI Models
by Erik Arakelyan, Zhaoqi Liu, Isabelle Augenstein
First submitted to arxiv on: 25 Jan 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper investigates the capabilities of transformer-based Natural Language Understanding (NLU) models, specifically their understanding of lexical and compositional semantics. While previous studies have claimed that these models possess an understanding of such semantics, the authors argue that this claim should be taken with caution. They find that state-of-the-art NLI models are sensitive to minor semantics-preserving surface-form variations, leading to inconsistent model decisions during inference. This sensitivity is distinct from valid comprehension of compositional semantics and does not emerge when evaluating model accuracy on standard benchmarks or probing for syntactic, monotonic, and logically robust reasoning. The authors propose a novel framework to measure the extent of semantic sensitivity and evaluate NLI models on adversarially generated examples containing minor semantics-preserving surface-form input noise. Their experiments show that semantic sensitivity causes performance degradations of 12.92% and 23.71% average over in- and out-of-domain settings, respectively. The authors also perform ablation studies to analyze this phenomenon across models, datasets, and variations in inference. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper shows that some AI language models are not as good at understanding natural language as we thought. These models are very good at doing things like answering questions or completing tasks, but they’re actually quite bad at understanding the meaning of sentences. The authors did some clever experiments to show this and found that these models get confused if you change a sentence slightly. They also showed that this problem gets worse when the model is trying to do something new, rather than just doing what it was trained for. |
Keywords
* Artificial intelligence * Inference * Language understanding * Semantics * Transformer