Summary of A Systematic Review Of Neurips Dataset Management Practices, by Yiwei Wu et al.

A Systematic Review of NeurIPS Dataset Management Practices

by Yiwei Wu, Leah Ajmani, Shayne Longpre, Hanlin Li

First submitted to arxiv on: 31 Oct 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary Machine learning educators can summarize this paper as follows: The lack of consistent practices in managing large datasets is a significant challenge in machine learning research. A systematic review of datasets published at NeurIPS reveals that dataset provenance is often unclear due to ambiguous filtering and curation processes, and only a few sites offer structured metadata and version control for hosting datasets. These findings underscore the need for standardized data infrastructures for publishing and managing datasets.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This paper shows how researchers don’t always follow good practices when sharing big datasets. It looks at four important things: where datasets come from, who gets them, what’s written about ethics, and what licenses are used. The results show that it’s hard to figure out where datasets came from because some filtering steps aren’t clear. Also, different websites host datasets but only a few help keep track of changes with version control. This makes us realize we need better ways to share and manage big datasets.

Keywords

* Artificial intelligence * Machine learning

A Systematic Review of NeurIPS Dataset Management Practices

by Yiwei Wu, Leah Ajmani, Shayne Longpre, Hanlin Li

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Clustering Ensemble Algorithm with High-order Consistency Learning, by Jianwen Gan et al.

Summary of Unsupervised Feature Selection Algorithm Based on Graph Filtering and Self-representation, by Yunhui Liang et al.

Related Posts