Summary of Repliqa: a Question-answering Dataset For Benchmarking Llms on Unseen Reference Content, by Joao Monteiro et al.

RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content

by Joao Monteiro, Pierre-Andre Noel, Etienne Marcotte, Sai Rajeswar, Valentina Zantedeschi, David Vazquez, Nicolas Chapados, Christopher Pal, Perouz Taslakian

First submitted to arxiv on: 17 Jun 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary Large Language Models (LLMs) are trained on vast amounts of data, including encyclopedic documents and benchmark datasets. To evaluate these models accurately, we introduce a new test dataset called RepLiQA, consisting of five splits of test sets that have not been released to the internet or exposed to LLM APIs prior to this publication. Each sample in RepLiQA includes a reference document, question about the topic, ground-truth answer, and paragraph containing the answer. We run a large-scale benchmark using several state-of-the-art LLMs to uncover differences in performance across models of various types and sizes in a context-conditional language modeling setting.
Low	GrooveSquid.com (original content)	Low Difficulty Summary Large Language Models are trained on lots of data from the internet. But this data can be messy, with some parts being used for testing models too! To fix this, we created a new test dataset called RepLiQA that has separate groups of questions and answers not found online or in model APIs before now. Each group has a fake news article, a question about it, the correct answer, and the paragraph where the answer is. We tested many top-performing models to see how they do on this new dataset.

Keywords

* Artificial intelligence

RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content

by Joao Monteiro, Pierre-Andre Noel, Etienne Marcotte, Sai Rajeswar, Valentina Zantedeschi, David Vazquez, Nicolas Chapados, Christopher Pal, Perouz Taslakian

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Trace the Evidence: Constructing Knowledge-grounded Reasoning Chains For Retrieval-augmented Generation, by Jinyuan Fang et al.

Summary of Generative Ai Voting: Fair Collective Choice Is Resilient to Llm Biases and Inconsistencies, by Srijoni Majumdar et al.

Related Posts