Loading Now

Summary of Simpletom: Exposing the Gap Between Explicit Tom Inference and Implicit Tom Application in Llms, by Yuling Gu et al.


SimpleToM: Exposing the Gap between Explicit ToM Inference and Implicit ToM Application in LLMs

by Yuling Gu, Oyvind Tafjord, Hyunwoo Kim, Jared Moore, Ronan Le Bras, Peter Clark, Yejin Choi

First submitted to arxiv on: 17 Oct 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper explores whether large language models (LLMs) possess a “theory of mind” (ToM), which enables them to attribute mental states to themselves and others. The authors create a new dataset, SimpleTom, consisting of concise stories with questions testing different levels of ToM reasoning. They find that while LLMs can accurately predict mental states, they often struggle to correctly predict behavior or judge whether given behaviors are reasonable, despite being aware of the protagonist’s mental state. The authors demonstrate that interventions like reminding the model of its earlier mental state answer and chain-of-thought prompting can improve performance on these tasks, but even with these techniques, LLMs’ natural performances remain low.
Low GrooveSquid.com (original content) Low Difficulty Summary
This study looks at whether big computer programs (called language models) understand how people think. These programs are really good at understanding words, but they don’t always do a great job of figuring out what’s going on in someone’s mind or predicting what they will do next. The researchers created a special test to see if these programs could understand stories and make smart choices based on them. They found that the programs were good at understanding some things, but not so good at making decisions about what people would do next. They also showed that giving the programs hints can help them make better choices.

Keywords

» Artificial intelligence  » Prompting