Summary of Explore Theory Of Mind: Program-guided Adversarial Data Generation For Theory Of Mind Reasoning, by Melanie Sclar et al.
Explore Theory of Mind: Program-guided adversarial data generation for theory of mind reasoning
by Melanie Sclar, Jane Yu, Maryam Fazel-Zarandi, Yulia Tsvetkov, Yonatan Bisk, Yejin Choi, Asli Celikyilmaz
First submitted to arxiv on: 12 Dec 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This research paper investigates the capabilities of large language models (LLMs) in developing a theory of mind. Current evaluations rely on limited datasets with simple patterns, potentially leading to overestimation of model abilities. The authors introduce ExploreToM, a framework generating diverse and challenging data for robust training and evaluation. Their approach uses A* search to produce complex story structures and novel scenarios. Evaluation reveals state-of-the-art LLMs struggle, showing accuracies as low as 0% and 9%. Fine-tuning on ExploreToM-generated data improves performance by 27 points on the classic ToMi benchmark. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper asks if large language models have a theory of mind, a key ability for social intelligence. Right now, most papers use simple datasets that might not be enough to really test these models. The authors created a new way to make lots of diverse and challenging data to train and evaluate the models better. They used a special search method to create complex stories and scenarios. When they tested it, they found that even the best models struggled, with accuracy as low as 0% or 9%. But if they fine-tuned their models on this new data, they got much better results. |
Keywords
» Artificial intelligence » Fine tuning