Summary of Distinguishing the Knowable From the Unknowable with Language Models, by Gustaf Ahdritz et al.
Distinguishing the Knowable from the Unknowable with Language Models
by Gustaf Ahdritz, Tian Qin, Nikhil Vyas, Boaz Barak, Benjamin L. Edelman
First submitted to arxiv on: 5 Feb 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper investigates the distinction between epistemic uncertainty (lack of knowledge) and aleatoric uncertainty (entropy in the underlying distribution) in large language models’ outputs over free-form text. To address this, it proposes a novel approach where a larger model serves as a proxy for ground truth, allowing small linear probes to accurately predict when a larger model will be more confident at the token level. The findings demonstrate that probes trained on one domain can generalize to others and suggest that large language models naturally contain internal representations of different types of uncertainty. This research has significant implications for developing more informative indicators of model confidence in various practical settings. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper looks at how big language models make decisions about things they’re not sure about. It’s like trying to figure out what someone means when they say “I think there might be a cat outside”. The researchers want to know if the model is unsure because it doesn’t have enough information (like not knowing what the word “cat” means) or because the situation is really uncertain (like the cat might be hiding behind a bush). To do this, they use an even bigger language model as a kind of “expert” to help figure out when the smaller model is more confident. They find that this works pretty well and could be useful in lots of situations where we need to understand what machines are thinking. |
Keywords
* Artificial intelligence * Language model * Token