Loading Now

Summary of Axcel: Automated Explainable Consistency Evaluation Using Llms, by P Aditya Sreekar et al.


AXCEL: Automated eXplainable Consistency Evaluation using LLMs

by P Aditya Sreekar, Sahil Verma, Suransh Chopra, Sarik Ghazarian, Abhishek Persad, Narayanan Sadagopan

First submitted to arxiv on: 25 Sep 2024

Categories

  • Main: Artificial Intelligence (cs.AI)
  • Secondary: Computation and Language (cs.CL)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper addresses the challenge of evaluating the consistency of Large Language Models (LLMs) in generating text responses. Current metrics like ROUGE and BLEU have a weak correlation with human judgment, while more sophisticated approaches using Natural Language Inference (NLI) are complex to implement and lack explainability. The authors introduce AXCEL, a prompt-based consistency metric that offers explanations for the consistency scores by providing detailed reasoning and pinpointing inconsistent text spans. AXCEL outperforms state-of-the-art metrics in detecting inconsistencies across summarization, free text generation, and data-to-text conversion tasks. The paper also evaluates the influence of underlying LLMs on prompt-based metric performance and recalibrates SOTA prompt-based metrics for fair comparison.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper helps solve a big problem with computer programs that generate text. These programs are called Large Language Models (LLMs). Currently, it’s hard to know if they’re making sense or not. The authors come up with a new way to check if the text is consistent and explain why some parts might be wrong. They call this method AXCEL. It does better than other methods in checking if the text makes sense for different tasks like summarizing information, generating free text, and converting data into text.

Keywords

» Artificial intelligence  » Bleu  » Inference  » Prompt  » Rouge  » Summarization  » Text generation