Summary of Is My Meeting Summary Good? Estimating Quality with a Multi-llm Evaluator, by Frederic Kirstein et al.

Is my Meeting Summary Good? Estimating Quality with a Multi-LLM Evaluator

by Frederic Kirstein, Terry Ruas, Bela Gipp

First submitted to arxiv on: 27 Nov 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary This research paper proposes a novel natural language generation (NLG) system called MESA that utilizes large language models (LLMs) to evaluate the quality of meeting summaries. The study highlights the limitations of existing metrics, such as ROUGE and BERTScore, which have a low correlation with human judgments and fail to capture nuanced errors. MESA employs a three-step assessment of individual error types, multi-agent discussion for decision refinement, and feedback-based self-training to refine error definition understanding and alignment with human judgment. The framework achieves mid to high Point-Biserial correlation with human judgment in error detection and mid Spearman and Kendall correlation in reflecting error impact on summary quality. MESA’s flexibility in adapting to custom error guidelines makes it suitable for various tasks with limited human-labeled data.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This research paper is about creating a better way to evaluate how well computers can summarize meetings. Right now, there are some problems with the ways we measure this, like ROUGE and BERTScore, which don’t always agree with what humans think is good or bad. The researchers created a new system called MESA that uses big language models to understand what makes meeting summaries good or bad. MESA has three parts: it looks at individual errors, talks to other agents to decide if the error is important, and trains itself using feedback from humans. This new way of evaluating meeting summaries is better than previous methods because it gets closer to what humans think is good.

Keywords

» Artificial intelligence » Alignment » Rouge » Self training

Is my Meeting Summary Good? Estimating Quality with a Multi-LLM Evaluator

by Frederic Kirstein, Terry Ruas, Bela Gipp

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Personacraft: Personalized and Controllable Full-body Multi-human Scene Generation Using Occlusion-aware 3d-conditioned Diffusion, by Gwanghyun Kim et al.

Summary of Dspy-based Neural-symbolic Pipeline to Enhance Spatial Reasoning in Llms, by Rong Wang et al.

Related Posts