Loading Now

Summary of Better Than Random: Reliable Nlg Human Evaluation with Constrained Active Sampling, by Jie Ruan et al.


Better than Random: Reliable NLG Human Evaluation with Constrained Active Sampling

by Jie Ruan, Xiao Pu, Mingqi Gao, Xiaojun Wan, Yuesheng Zhu

First submitted to arxiv on: 12 Jun 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
A novel Constrained Active Sampling Framework (CASF) is proposed to enhance the reliability of human evaluation in Natural Language Generation (NLG). The framework aims to select representative samples for inter-system ranking, reducing labor and costs while maintaining accuracy. CASF consists of a Learner, Systematic Sampler, and Constrained Controller, which work together to identify the most informative data points. Evaluation results on 137 real NLG setups with 44 human metrics across 16 datasets and 5 tasks demonstrate CASF’s effectiveness, achieving 93.18% top-ranked system recognition accuracy and ranking first or second on 90.91% of human metrics. The overall inter-system ranking Kendall correlation is 0.83.
Low GrooveSquid.com (original content) Low Difficulty Summary
CASF helps make human evaluation in NLG more reliable by selecting the best samples for comparison. This framework reduces the need for expensive and time-consuming evaluations, making it a valuable tool for researchers. By using CASF, you can get a better idea of which systems perform well across different tasks and datasets.

Keywords

» Artificial intelligence