Summary of Convergences and Divergences Between Automatic Assessment and Human Evaluation: Insights From Comparing Chatgpt-generated Translation and Neural Machine Translation, by Zhaokun Jiang and Qianxi Lv and Ziyin Zhang and Lei Lei

Convergences and Divergences between Automatic Assessment and Human Evaluation: Insights from Comparing ChatGPT-Generated Translation and Neural Machine Translation

by Zhaokun Jiang, Qianxi Lv, Ziyin Zhang, Lei Lei

First submitted to arxiv on: 10 Jan 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The study compares large language models like ChatGPT to neural machine translation (NMT) systems, investigating how well automated metrics align with human evaluation in assessing machine translation quality. Four automated metrics are used for automatic assessment, while human evaluation incorporates a specific error typology and six rubrics. The results show that automated metrics converge with human evaluation when measuring formal fidelity, but diverge when evaluating semantic and pragmatic fidelity, highlighting the importance of human judgment in evaluating advanced translation tools.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This study compares big language models like ChatGPT to special machines that translate languages (NMT). They want to know if the computer ways of judging how good the translations are match what humans think. They use four ways for computers to judge, and a special way for people to rate the translations. The results show that computers and humans agree on how well the translations get the facts right, but disagree on how well they capture the meaning and tone.

Keywords

* Artificial intelligence * Translation

Convergences and Divergences between Automatic Assessment and Human Evaluation: Insights from Comparing ChatGPT-Generated Translation and Neural Machine Translation

by Zhaokun Jiang, Qianxi Lv, Ziyin Zhang, Lei Lei

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Derm-t2im: Harnessing Synthetic Skin Lesion Data Via Stable Diffusion Models For Enhanced Skin Disease Classification Using Vit and Cnn, by Muhammad Ali Farooq et al.

Summary of Miss: a Generative Pretraining and Finetuning Approach For Med-vqa, by Jiawei Chen et al.

Related Posts