Summary of Who’s Who: Large Language Models Meet Knowledge Conflicts in Practice, by Quang Hieu Pham et al.
Who’s Who: Large Language Models Meet Knowledge Conflicts in Practice
by Quang Hieu Pham, Hoang Ngo, Anh Tuan Luu, Dat Quoc Nguyen
First submitted to arxiv on: 21 Oct 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary Retrieval-augmented generation (RAG) methods are a viable solution to address pre-trained language models’ static memory limits. However, encountering conflicting information within the retrieval context is an inevitable challenge. To analyze current large language models’ behavior in knowledge conflict situations, we introduce WhoQA, a public benchmark dataset. We induce conflicts by asking about entities with the same name, resulting in questions with up to 8 distinctive answers. The WhoQA evaluation set includes 5K questions across 13 Wikidata property types and 150K Wikipedia entities. Our experiments show that knowledge conflicts significantly degrade large language models’ performance in RAG settings. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Imagine trying to answer questions about people, places, or things that have the same name. This can be a challenge for computers too! They need help sorting through all the information they find online. To test how well these computers do with this kind of task, we created a dataset called WhoQA. We gave them questions like “What do John Smith and Jane Smith have in common?” to see if they could handle the conflicting answers. The results show that when there are multiple correct answers, computer models struggle to find the right one. |
Keywords
» Artificial intelligence » Rag » Retrieval augmented generation