Loading Now

Summary of Negotiationtom: a Benchmark For Stress-testing Machine Theory Of Mind on Negotiation Surrounding, by Chunkit Chan et al.


NegotiationToM: A Benchmark for Stress-testing Machine Theory of Mind on Negotiation Surrounding

by Chunkit Chan, Cheng Jiayang, Yauwai Yim, Zheye Deng, Wei Fan, Haoran Li, Xin Liu, Hongming Zhang, Weiqi Wang, Yangqiu Song

First submitted to arxiv on: 21 Apr 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
Large Language Models (LLMs) have been generating significant interest and debate regarding their potential Theory of Mind (ToM) abilities. Currently, ToM evaluations rely on machine-generated data or game settings prone to shortcuts and spurious correlations, which neglects the evaluation of LLMs in real-world human interaction scenarios. This highlights the need for new real-world scenario benchmarks. Our proposed NegotiationToM benchmark aims to stress-test machine ToM abilities in real-world negotiation surrounding covered mental states (desires, beliefs, and intentions) using the Belief-Desire-Intention (BDI) agent modeling theory. Empirical experiments were conducted to evaluate large language models, demonstrating that they consistently perform worse than humans even when employing the chain-of-thought (CoT) method.
Low GrooveSquid.com (original content) Low Difficulty Summary
This research paper is about testing how well computers can understand human thoughts and feelings in real-life situations. Right now, we test computers using fake data or game-like scenarios that might not be very realistic. We need new ways to test computers that are more like the way humans interact with each other. Our team created a new benchmark called NegotiationToM that simulates real-world negotiations where people discuss their desires, beliefs, and intentions. We found that even the best computer models don’t do as well as humans in these situations.

Keywords

» Artificial intelligence