Summary of Fact-checking the Output Of Large Language Models Via Token-level Uncertainty Quantification, by Ekaterina Fadeeva et al.
Fact-Checking the Output of Large Language Models via Token-Level Uncertainty Quantification
by Ekaterina Fadeeva, Aleksandr Rubashevskii, Artem Shelmanov, Sergey Petrakov, Haonan Li, Hamdy Mubarak, Evgenii Tsymbalov, Gleb Kuzmin, Alexander Panchenko, Timothy Baldwin, Preslav Nakov, Maxim Panov
First submitted to arxiv on: 7 Mar 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary A novel approach to detecting hallucinations in large language models (LLMs) is proposed, which can be used for fact-checking and improving the reliability of their output. The method, called Claim Conditioned Probability (CCP), quantifies the uncertainty of a particular claim expressed by the model at the token level, allowing for more accurate detection of unreliable predictions. Experimental results show strong improvements in biography generation tasks using seven LLMs across four languages, with human evaluation revealing competitive performance compared to an external knowledge-based fact-checking tool. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Large language models are really good at generating text, but sometimes they make mistakes. This can be a problem because it’s hard to tell when the model is making something up. Some services that use these models don’t check for errors, so we need a way to detect when the model is being unreliable. We’ve come up with a new method called Claim Conditioned Probability (CCP) that looks at how certain the model is about what it’s saying. This helps us catch mistakes and make sure the information is accurate. In our tests, this method worked really well, especially when used to generate biographies. |
Keywords
* Artificial intelligence * Probability * Token