Summary of What’s Under the Hood: Investigating Automatic Metrics on Meeting Summarization, by Frederic Kirstein et al.

What’s under the hood: Investigating Automatic Metrics on Meeting Summarization

by Frederic Kirstein, Jan Philip Wahle, Terry Ruas, Bela Gipp

First submitted to arxiv on: 17 Apr 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The paper investigates the evaluation of meeting summarization techniques, which have become increasingly important due to the rise of online interactions. It examines how automatic metrics correlate with human evaluations across a broad error taxonomy and finds that current default-used metrics struggle to capture observable errors, showing weak to mid-correlations. The study uses annotated transcripts and summaries from Transformer-based sequence-to-sequence and autoregressive models from the general summary QMSum dataset and finds that different model architectures respond variably to challenges in meeting transcripts, resulting in different pronounced links between challenges and errors. The results show that only a subset of metrics reacts accurately to specific errors, while most correlations show either unresponsiveness or failure to reflect the error’s impact on summary quality.
Low	GrooveSquid.com (original content)	Low Difficulty Summary Meeting summarization is an important task because it helps people quickly understand what happened during meetings. But right now, there are no good ways to evaluate how well meeting summarizers do their job. This paper tries to fix that by studying how automatic metrics (which are used to test meeting summarizers) compare to human evaluations of summaries. It finds that most of these metrics don’t work very well and can even hide errors in the summaries they’re testing. The study uses real transcripts and summaries from a dataset called QMSum to see which model architectures work best for different types of meetings.

Keywords

* Artificial intelligence * Autoregressive * Summarization * Transformer

What’s under the hood: Investigating Automatic Metrics on Meeting Summarization

by Frederic Kirstein, Jan Philip Wahle, Terry Ruas, Bela Gipp

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Causal Effect Estimation Using Random Hyperplane Tessellations, by Abhishek Dalvi et al.

Summary of Dupe: Detection Undermining Via Prompt Engineering For Deepfake Text, by James Weichert and Chinecherem Dimobi

Related Posts