Summary of Negotiationtom: a Benchmark For Stress-testing Machine Theory Of Mind on Negotiation Surrounding, by Chunkit Chan et al.

NegotiationToM: A Benchmark for Stress-testing Machine Theory of Mind on Negotiation Surrounding

by Chunkit Chan, Cheng Jiayang, Yauwai Yim, Zheye Deng, Wei Fan, Haoran Li, Xin Liu, Hongming Zhang, Weiqi Wang, Yangqiu Song

First submitted to arxiv on: 21 Apr 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary Large Language Models (LLMs) have been generating significant interest and debate regarding their potential Theory of Mind (ToM) abilities. Currently, ToM evaluations rely on machine-generated data or game settings prone to shortcuts and spurious correlations, which neglects the evaluation of LLMs in real-world human interaction scenarios. This highlights the need for new real-world scenario benchmarks. Our proposed NegotiationToM benchmark aims to stress-test machine ToM abilities in real-world negotiation surrounding covered mental states (desires, beliefs, and intentions) using the Belief-Desire-Intention (BDI) agent modeling theory. Empirical experiments were conducted to evaluate large language models, demonstrating that they consistently perform worse than humans even when employing the chain-of-thought (CoT) method.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This research paper is about testing how well computers can understand human thoughts and feelings in real-life situations. Right now, we test computers using fake data or game-like scenarios that might not be very realistic. We need new ways to test computers that are more like the way humans interact with each other. Our team created a new benchmark called NegotiationToM that simulates real-world negotiations where people discuss their desires, beliefs, and intentions. We found that even the best computer models don’t do as well as humans in these situations.

Keywords

* Artificial intelligence

NegotiationToM: A Benchmark for Stress-testing Machine Theory of Mind on Negotiation Surrounding

by Chunkit Chan, Cheng Jiayang, Yauwai Yim, Zheye Deng, Wei Fan, Haoran Li, Xin Liu, Hongming Zhang, Weiqi Wang, Yangqiu Song

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Masked Latent Transformer with the Random Masking Ratio to Advance the Diagnosis Of Dental Fluorosis, by Yun Wu and Hao Xu and Maohua Gu and Zhongchuan Jiang and Jun Xu and Youliang Tian

Summary of Unlawful Proxy Discrimination: a Framework For Challenging Inherently Discriminatory Algorithms, by Hilde Weerts et al.

Related Posts