Loading Now

Summary of Maqa: Evaluating Uncertainty Quantification in Llms Regarding Data Uncertainty, by Yongjin Yang et al.


MAQA: Evaluating Uncertainty Quantification in LLMs Regarding Data Uncertainty

by Yongjin Yang, Haneul Yoo, Hwaran Lee

First submitted to arxiv on: 13 Aug 2024

Categories

  • Main: Artificial Intelligence (cs.AI)
  • Secondary: Computation and Language (cs.CL)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The paper investigates uncertainty quantification methods used in large language models (LLMs) and evaluates their performance under data uncertainty, which arises from irreducible randomness. The authors propose a new dataset, MAQA, to assess uncertainty quantification regarding data uncertainty. They also examine five uncertainty quantification methods of diverse white- and black-box LLMs. The findings show that entropy-based and consistency-based methods estimate model uncertainty well even in the presence of data uncertainty. However, other methods struggle depending on the tasks, with overconfidence observed in reasoning tasks for white-box LLMs.
Low GrooveSquid.com (original content) Low Difficulty Summary
The paper looks at how good large language models are at giving correct answers. Right now, these models can give answers that sound right but aren’t actually true. To fix this, researchers have been trying to figure out if the answer is correct or not by looking at how sure the model is about its response. But most of these methods only look at whether the model knows the answer, not if there’s any chance it might be wrong because of the data itself being uncertain. This paper looks at previous ways of doing this and proposes a new way to test them using a special kind of dataset that includes questions that require reasoning or knowledge. The results show that some methods are better than others and that they work differently depending on what kind of question is asked.

Keywords

» Artificial intelligence