Summary of How Good Are Llms For Literary Translation, Really? Literary Translation Evaluation with Humans and Llms, by Ran Zhang et al.

How Good Are LLMs for Literary Translation, Really? Literary Translation Evaluation with Humans and LLMs

by Ran Zhang, Wei Zhao, Steffen Eger

First submitted to arxiv on: 24 Oct 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The proposed paper contributes to the ongoing discussion on evaluating literary machine translation (MT) by introducing a novel parallel corpus, LITEVAL-CORPUS. This corpus consists of verified human translations and outputs from nine MT systems, totaling over 2,000 translations and 13,000 evaluated sentences across four language pairs, which is an extensive dataset for literary MT evaluation. The paper investigates the consistency and adequacy of human evaluation schemes with varying complexities, compares evaluations by students and professionals, assesses the effectiveness of LLM-based metrics, and evaluates the performance of LLMs themselves. The findings reveal that the adequacy of human evaluation is influenced by two factors: the complexity of the evaluation scheme and the expertise of evaluators.
Low	GrooveSquid.com (original content)	Low Difficulty Summary The paper introduces a new parallel corpus for literary machine translation (MT) evaluation and explores how well humans and machines can translate literature. It shows that different methods for evaluating translations work better or worse depending on who is doing the judging and what kind of texts are being translated. The results suggest that even recent language models struggle to produce translations that are as good as those written by humans.

Keywords

» Artificial intelligence » Translation

How Good Are LLMs for Literary Translation, Really? Literary Translation Evaluation with Humans and LLMs

by Ran Zhang, Wei Zhao, Steffen Eger

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Agentstore: Scalable Integration Of Heterogeneous Agents As Specialized Generalist Computer Assistant, by Chengyou Jia et al.

Summary of Prism: a Methodology For Auditing Biases in Large Language Models, by Leif Azzopardi and Yashar Moshfeghi

Related Posts