Summary of Generalization or Memorization: Data Contamination and Trustworthy Evaluation For Large Language Models, by Yihong Dong et al.

Generalization or Memorization: Data Contamination and Trustworthy Evaluation for Large Language Models

by Yihong Dong, Xue Jiang, Huanyu Liu, Zhi Jin, Bin Gu, Mengfei Yang, Ge Li

First submitted to arxiv on: 24 Feb 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The recent advancements in large language models (LLMs) have sparked concerns about data contamination due to the vast size and diverse sources of their training data. As LLMs are often evaluated on open-access benchmarks, it is possible for them to be more susceptible to data contamination. This paper proposes CDD (Contamination Detection via output Distribution), a method that detects data contamination by analyzing the peakedness of an LLM’s output distribution. Additionally, the authors present TED (Trustworthy Evaluation via output Distribution) to mitigate the impact of data contamination in evaluation. Two new benchmarks, DetCon and ComiEval, are introduced for data contamination detection and mitigation evaluation tasks. The proposed methods demonstrate significant improvements over existing approaches in detecting implicit contamination.
Low	GrooveSquid.com (original content)	Low Difficulty Summary Large language models have made impressive progress recently, but there’s a concern that they might be biased or fake because of the way they’re trained. This paper wants to fix this problem by making sure the results are fair and honest. They propose two new ways to detect and correct problems in the data: CDD (Contamination Detection) and TED (Trustworthy Evaluation). These methods can help us understand if a model is telling the truth or not, and make sure we’re not getting fake results.

Keywords

* Artificial intelligence

Generalization or Memorization: Data Contamination and Trustworthy Evaluation for Large Language Models

by Yihong Dong, Xue Jiang, Huanyu Liu, Zhi Jin, Bin Gu, Mengfei Yang, Ge Li

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Learning to See Through Dazzle, by Xiaopeng Peng et al.

Summary of Deep Contrastive Graph Learning with Clustering-oriented Guidance, by Mulin Chen et al.

Related Posts