Summary of Negotiationtom: a Benchmark For Stress-testing Machine Theory Of Mind on Negotiation Surrounding, by Chunkit Chan et al.
NegotiationToM: A Benchmark for Stress-testing Machine Theory of Mind on Negotiation Surrounding
by Chunkit Chan, Cheng Jiayang, Yauwai Yim, Zheye Deng, Wei Fan, Haoran Li, Xin Liu, Hongming Zhang, Weiqi Wang, Yangqiu Song
First submitted to arxiv on: 21 Apr 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary Large Language Models (LLMs) have been generating significant interest and debate regarding their potential Theory of Mind (ToM) abilities. Currently, ToM evaluations rely on machine-generated data or game settings prone to shortcuts and spurious correlations, which neglects the evaluation of LLMs in real-world human interaction scenarios. This highlights the need for new real-world scenario benchmarks. Our proposed NegotiationToM benchmark aims to stress-test machine ToM abilities in real-world negotiation surrounding covered mental states (desires, beliefs, and intentions) using the Belief-Desire-Intention (BDI) agent modeling theory. Empirical experiments were conducted to evaluate large language models, demonstrating that they consistently perform worse than humans even when employing the chain-of-thought (CoT) method. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This research paper is about testing how well computers can understand human thoughts and feelings in real-life situations. Right now, we test computers using fake data or game-like scenarios that might not be very realistic. We need new ways to test computers that are more like the way humans interact with each other. Our team created a new benchmark called NegotiationToM that simulates real-world negotiations where people discuss their desires, beliefs, and intentions. We found that even the best computer models don’t do as well as humans in these situations. |