Summary of How Good Are Llms For Literary Translation, Really? Literary Translation Evaluation with Humans and Llms, by Ran Zhang et al.
How Good Are LLMs for Literary Translation, Really? Literary Translation Evaluation with Humans and LLMs
by Ran Zhang, Wei Zhao, Steffen Eger
First submitted to arxiv on: 24 Oct 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The proposed paper contributes to the ongoing discussion on evaluating literary machine translation (MT) by introducing a novel parallel corpus, LITEVAL-CORPUS. This corpus consists of verified human translations and outputs from nine MT systems, totaling over 2,000 translations and 13,000 evaluated sentences across four language pairs, which is an extensive dataset for literary MT evaluation. The paper investigates the consistency and adequacy of human evaluation schemes with varying complexities, compares evaluations by students and professionals, assesses the effectiveness of LLM-based metrics, and evaluates the performance of LLMs themselves. The findings reveal that the adequacy of human evaluation is influenced by two factors: the complexity of the evaluation scheme and the expertise of evaluators. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary The paper introduces a new parallel corpus for literary machine translation (MT) evaluation and explores how well humans and machines can translate literature. It shows that different methods for evaluating translations work better or worse depending on who is doing the judging and what kind of texts are being translated. The results suggest that even recent language models struggle to produce translations that are as good as those written by humans. |
Keywords
» Artificial intelligence » Translation