Loading Now

Summary of Discoverybench: Towards Data-driven Discovery with Large Language Models, by Bodhisattwa Prasad Majumder et al.


DiscoveryBench: Towards Data-Driven Discovery with Large Language Models

by Bodhisattwa Prasad Majumder, Harshit Surana, Dhruv Agarwal, Bhavana Dalvi Mishra, Abhijeetsingh Meena, Aryan Prakhar, Tirth Vora, Tushar Khot, Ashish Sabharwal, Peter Clark

First submitted to arxiv on: 1 Jul 2024

Categories

  • Main: Computation and Language (cs.CL)
  • Secondary: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper presents DiscoveryBench, a comprehensive benchmark designed to evaluate the capabilities of large language models (LLMs) in automating data-driven discovery. The benchmark formalizes the multi-step process of discovery and consists of 264 tasks across six domains, including sociology and engineering. Each task is defined by a dataset, metadata, and a discovery goal in natural language. Additionally, the authors provide 903 synthetic tasks for controlled evaluations. They use several popular LLM-based reasoning frameworks as baselines and find that even the best system scores only 25%. The authors’ structured formalism of data-driven discovery enables facet-based evaluation, providing insights into different failure modes. Overall, DiscoveryBench serves as a valuable resource to improve LLMs in data-driven discovery.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper is about using big language models to help scientists find new ideas and discoveries from datasets. The authors created a special test called DiscoveryBench that checks how well these models can do this job. They used real-world examples from published papers to create 264 tasks across six different areas like sociology and engineering. Each task has a dataset, some information about the data, and a goal for what they want to find. The authors also created 903 fake tasks to test the models in a controlled way. They found that even the best model only got 25% of the tasks right, which shows how hard it is to automate this process. Overall, DiscoveryBench helps us understand how to make language models better at finding new discoveries.

Keywords

* Artificial intelligence