Loading Now

Summary of Dsbench: How Far Are Data Science Agents to Becoming Data Science Experts?, by Liqiang Jing et al.


DSBench: How Far Are Data Science Agents to Becoming Data Science Experts?

by Liqiang Jing, Zhehui Huang, Xiaoyang Wang, Wenlin Yao, Wenhao Yu, Kaixin Ma, Hongming Zhang, Xinya Du, Dong Yu

First submitted to arxiv on: 12 Sep 2024

Categories

  • Main: Artificial Intelligence (cs.AI)
  • Secondary: Computation and Language (cs.CL)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The proposed Large Vision-Language Model (LVLM) benchmark, DSBench, aims to evaluate data science agents’ performance in realistic tasks. The comprehensive benchmark comprises 466 data analysis tasks and 74 data modeling tasks sourced from Eloquence and Kaggle competitions. Unlike existing benchmarks, DSBench includes long contexts, multimodal task backgrounds, and end-to-end data modeling tasks, making it a more challenging and practical evaluation framework. State-of-the-art LLMs, LVLMs, and agents struggle with most tasks, with the best agent solving only 34.12% of data analysis tasks. This highlights the need for further advancements in developing more practical, intelligent, and autonomous data science agents.
Low GrooveSquid.com (original content) Low Difficulty Summary
Databench is a new way to test how well AI computers can do real-world tasks. Right now, some AI computers are great at answering questions or doing simple math problems. But they’re not very good at doing the kinds of things that humans do every day, like analyzing big amounts of data. The people who made Databench want to see if these AI computers can actually help us with real-world tasks. They tested some of the best AI computers on a bunch of different tasks and found that even the best ones struggled a lot. This means we need to keep working to make AI computers better at doing real-world tasks.

Keywords

» Artificial intelligence  » Language model