Summary of Dsbench: How Far Are Data Science Agents to Becoming Data Science Experts?, by Liqiang Jing et al.

DSBench: How Far Are Data Science Agents to Becoming Data Science Experts?

by Liqiang Jing, Zhehui Huang, Xiaoyang Wang, Wenlin Yao, Wenhao Yu, Kaixin Ma, Hongming Zhang, Xinya Du, Dong Yu

First submitted to arxiv on: 12 Sep 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The proposed Large Vision-Language Model (LVLM) benchmark, DSBench, aims to evaluate data science agents’ performance in realistic tasks. The comprehensive benchmark comprises 466 data analysis tasks and 74 data modeling tasks sourced from Eloquence and Kaggle competitions. Unlike existing benchmarks, DSBench includes long contexts, multimodal task backgrounds, and end-to-end data modeling tasks, making it a more challenging and practical evaluation framework. State-of-the-art LLMs, LVLMs, and agents struggle with most tasks, with the best agent solving only 34.12% of data analysis tasks. This highlights the need for further advancements in developing more practical, intelligent, and autonomous data science agents.
Low	GrooveSquid.com (original content)	Low Difficulty Summary Databench is a new way to test how well AI computers can do real-world tasks. Right now, some AI computers are great at answering questions or doing simple math problems. But they’re not very good at doing the kinds of things that humans do every day, like analyzing big amounts of data. The people who made Databench want to see if these AI computers can actually help us with real-world tasks. They tested some of the best AI computers on a bunch of different tasks and found that even the best ones struggled a lot. This means we need to keep working to make AI computers better at doing real-world tasks.

Keywords

* Artificial intelligence * Language model

DSBench: How Far Are Data Science Agents to Becoming Data Science Experts?

by Liqiang Jing, Zhehui Huang, Xiaoyang Wang, Wenlin Yao, Wenhao Yu, Kaixin Ma, Hongming Zhang, Xinya Du, Dong Yu

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Module-wise Adaptive Adversarial Training For End-to-end Autonomous Driving, by Tianyuan Zhang et al.

Summary of Lagrange Duality and Compound Multi-attention Transformer For Semi-supervised Medical Image Segmentation, by Fuchen Zheng et al.

Related Posts