Summary of Likelihood-based Mitigation Of Evaluation Bias in Large Language Models, by Masanari Ohi et al.
Likelihood-based Mitigation of Evaluation Bias in Large Language Models
by Masanari Ohi, Masahiro Kaneko, Ryuto Koike, Mengsay Loem, Naoaki Okazaki
First submitted to arxiv on: 25 Feb 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper investigates the potential for Large Language Models (LLMs) to introduce bias in natural language generation tasks when used as automated metrics. Specifically, it examines the likelihood bias, which arises from superficial differences in sentences’ word order and structure. The authors propose a method to mitigate this bias by utilizing highly biased instances as few-shot examples for in-context learning. Experimental results on data-to-text and grammatical error correction tasks demonstrate that several LLMs exhibit likelihood bias, but the proposed approach successfully reduces this bias and improves evaluation performance. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper looks at how Large Language Models can be unfair when they’re used to grade writing. It found that these models can favor certain types of sentences over others just because of their structure or word order. To fix this problem, the authors came up with a new way to train the models using special examples that help them learn to be fairer. They tested this method on two different tasks and found that it worked really well. |
Keywords
» Artificial intelligence » Few shot » Likelihood