Summary of Simpletom: Exposing the Gap Between Explicit Tom Inference and Implicit Tom Application in Llms, by Yuling Gu et al.
SimpleToM: Exposing the Gap between Explicit ToM Inference and Implicit ToM Application in LLMs
by Yuling Gu, Oyvind Tafjord, Hyunwoo Kim, Jared Moore, Ronan Le Bras, Peter Clark, Yejin Choi
First submitted to arxiv on: 17 Oct 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper explores whether large language models (LLMs) possess a “theory of mind” (ToM), which enables them to attribute mental states to themselves and others. The authors create a new dataset, SimpleTom, consisting of concise stories with questions testing different levels of ToM reasoning. They find that while LLMs can accurately predict mental states, they often struggle to correctly predict behavior or judge whether given behaviors are reasonable, despite being aware of the protagonist’s mental state. The authors demonstrate that interventions like reminding the model of its earlier mental state answer and chain-of-thought prompting can improve performance on these tasks, but even with these techniques, LLMs’ natural performances remain low. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This study looks at whether big computer programs (called language models) understand how people think. These programs are really good at understanding words, but they don’t always do a great job of figuring out what’s going on in someone’s mind or predicting what they will do next. The researchers created a special test to see if these programs could understand stories and make smart choices based on them. They found that the programs were good at understanding some things, but not so good at making decisions about what people would do next. They also showed that giving the programs hints can help them make better choices. |
Keywords
» Artificial intelligence » Prompting