Summary of Better Than Random: Reliable Nlg Human Evaluation with Constrained Active Sampling, by Jie Ruan et al.

Better than Random: Reliable NLG Human Evaluation with Constrained Active Sampling

by Jie Ruan, Xiao Pu, Mingqi Gao, Xiaojun Wan, Yuesheng Zhu

First submitted to arxiv on: 12 Jun 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary A novel Constrained Active Sampling Framework (CASF) is proposed to enhance the reliability of human evaluation in Natural Language Generation (NLG). The framework aims to select representative samples for inter-system ranking, reducing labor and costs while maintaining accuracy. CASF consists of a Learner, Systematic Sampler, and Constrained Controller, which work together to identify the most informative data points. Evaluation results on 137 real NLG setups with 44 human metrics across 16 datasets and 5 tasks demonstrate CASF’s effectiveness, achieving 93.18% top-ranked system recognition accuracy and ranking first or second on 90.91% of human metrics. The overall inter-system ranking Kendall correlation is 0.83.
Low	GrooveSquid.com (original content)	Low Difficulty Summary CASF helps make human evaluation in NLG more reliable by selecting the best samples for comparison. This framework reduces the need for expensive and time-consuming evaluations, making it a valuable tool for researchers. By using CASF, you can get a better idea of which systems perform well across different tasks and datasets.

Keywords

» Artificial intelligence

Better than Random: Reliable NLG Human Evaluation with Constrained Active Sampling

by Jie Ruan, Xiao Pu, Mingqi Gao, Xiaojun Wan, Yuesheng Zhu

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Small Scale Data-free Knowledge Distillation, by He Liu et al.

Summary of A Federated Online Restless Bandit Framework For Cooperative Resource Allocation, by Jingwen Tong et al.

Related Posts