Loading Now

Summary of Who’s Who: Large Language Models Meet Knowledge Conflicts in Practice, by Quang Hieu Pham et al.


Who’s Who: Large Language Models Meet Knowledge Conflicts in Practice

by Quang Hieu Pham, Hoang Ngo, Anh Tuan Luu, Dat Quoc Nguyen

First submitted to arxiv on: 21 Oct 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
Retrieval-augmented generation (RAG) methods are a viable solution to address pre-trained language models’ static memory limits. However, encountering conflicting information within the retrieval context is an inevitable challenge. To analyze current large language models’ behavior in knowledge conflict situations, we introduce WhoQA, a public benchmark dataset. We induce conflicts by asking about entities with the same name, resulting in questions with up to 8 distinctive answers. The WhoQA evaluation set includes 5K questions across 13 Wikidata property types and 150K Wikipedia entities. Our experiments show that knowledge conflicts significantly degrade large language models’ performance in RAG settings.
Low GrooveSquid.com (original content) Low Difficulty Summary
Imagine trying to answer questions about people, places, or things that have the same name. This can be a challenge for computers too! They need help sorting through all the information they find online. To test how well these computers do with this kind of task, we created a dataset called WhoQA. We gave them questions like “What do John Smith and Jane Smith have in common?” to see if they could handle the conflicting answers. The results show that when there are multiple correct answers, computer models struggle to find the right one.

Keywords

» Artificial intelligence  » Rag  » Retrieval augmented generation