Summary of Mmtom-qa: Multimodal Theory Of Mind Question Answering, by Chuanyang Jin et al.
MMToM-QA: Multimodal Theory of Mind Question Answering
by Chuanyang Jin, Yutong Wu, Jing Cao, Jiannan Xiang, Yen-Ling Kuo, Zhiting Hu, Tomer Ullman, Antonio Torralba, Joshua B. Tenenbaum, Tianmin Shu
First submitted to arxiv on: 16 Jan 2024
Categories
- Main: Artificial Intelligence (cs.AI)
- Secondary: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary In this paper, researchers aim to develop machines with human-level social intelligence by creating a new benchmark for Theory of Mind (ToM) understanding. Currently, ToM benchmarks rely on unimodal data, such as video or text, whereas humans can reason about others’ mental states based on conceptual representations from any available data. The authors introduce the MMToM-QA benchmark to evaluate machine ToM on multimodal data and different kinds of unimodal data. They also propose a novel method, BIP-ALM (Bayesian Inverse Planning Accelerated by Language Models), which extracts unified representations from multimodal data and utilizes language models for scalable Bayesian inverse planning. The authors compare human performance with state-of-the-art models, including GPT-4, and demonstrate that large language models and multimodal models lack robust ToM capacity, but BIP-ALM shows promising results. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper is about teaching machines to understand people’s thoughts and feelings. Right now, machines can only understand certain types of data, like videos or texts. But humans can figure out what someone is thinking based on all kinds of information. The researchers created a new test called MMToM-QA that helps machines learn to do this too. They also came up with a special way for the machine to think about people’s thoughts, using language models and other techniques. When they tested it, they found that even very smart machines aren’t very good at understanding people’s minds yet. But their new method might be able to help. |
Keywords
* Artificial intelligence * Gpt