Summary of Attributionbench: How Hard Is Automatic Attribution Evaluation?, by Yifei Li et al.

AttributionBench: How Hard is Automatic Attribution Evaluation?

by Yifei Li, Xiang Yue, Zeyi Liao, Huan Sun

First submitted to arxiv on: 23 Feb 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary This research paper presents AttributionBench, a comprehensive benchmark for automatic attribution evaluation methods in large language models (LLMs). The authors identify an open problem in verifying the attribution of generated responses from LLMs, which is currently dependent on costly human evaluation. They demonstrate that even state-of-the-art LLMs struggle to achieve high accuracy in automatic attribution evaluation, with a fine-tuned GPT-3.5 achieving around 80% macro-F1 under a binary classification formulation. The authors analyze over 300 error cases and find that most failures stem from the model’s inability to process nuanced information.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This paper is about creating a way to check if large language models are telling the truth when they provide evidence for their answers. Right now, it takes a lot of human effort to do this verification. The researchers created a big test dataset called AttributionBench and used it to see how well different models can do attribution evaluation. They found that even the best models aren’t very good at it, with only about 80% getting it right. By looking at why they got some answers wrong, they learned that the models often struggle with understanding complex ideas.

Keywords

* Artificial intelligence * Classification * Gpt

AttributionBench: How Hard is Automatic Attribution Evaluation?

by Yifei Li, Xiang Yue, Zeyi Liao, Huan Sun

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Stable Neural Stochastic Differential Equations in Analyzing Irregular Time Series Data, by Yongkyung Oh et al.

Summary of Learning Solution Operators Of Pdes Defined on Varying Domains Via Mionet, by Shanshan Xiao et al.

Related Posts