Loading Now

Summary of What’s Under the Hood: Investigating Automatic Metrics on Meeting Summarization, by Frederic Kirstein et al.


What’s under the hood: Investigating Automatic Metrics on Meeting Summarization

by Frederic Kirstein, Jan Philip Wahle, Terry Ruas, Bela Gipp

First submitted to arxiv on: 17 Apr 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The paper investigates the evaluation of meeting summarization techniques, which have become increasingly important due to the rise of online interactions. It examines how automatic metrics correlate with human evaluations across a broad error taxonomy and finds that current default-used metrics struggle to capture observable errors, showing weak to mid-correlations. The study uses annotated transcripts and summaries from Transformer-based sequence-to-sequence and autoregressive models from the general summary QMSum dataset and finds that different model architectures respond variably to challenges in meeting transcripts, resulting in different pronounced links between challenges and errors. The results show that only a subset of metrics reacts accurately to specific errors, while most correlations show either unresponsiveness or failure to reflect the error’s impact on summary quality.
Low GrooveSquid.com (original content) Low Difficulty Summary
Meeting summarization is an important task because it helps people quickly understand what happened during meetings. But right now, there are no good ways to evaluate how well meeting summarizers do their job. This paper tries to fix that by studying how automatic metrics (which are used to test meeting summarizers) compare to human evaluations of summaries. It finds that most of these metrics don’t work very well and can even hide errors in the summaries they’re testing. The study uses real transcripts and summaries from a dataset called QMSum to see which model architectures work best for different types of meetings.

Keywords

» Artificial intelligence  » Autoregressive  » Summarization  » Transformer