Loading Now

Summary of Automating Data Science Pipelines with Tensor Completion, by Shaan Pakala et al.


Automating Data Science Pipelines with Tensor Completion

by Shaan Pakala, Bryce Graw, Dawon Ahn, Tam Dinh, Mehnaz Tabassum Mahin, Vassilis Tsotras, Jia Chen, Evangelos E. Papalexakis

First submitted to arxiv on: 8 Oct 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: None

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper tackles the challenge of optimizing hyperparameters in data science pipelines, which typically requires computationally expensive searches through vast spaces. The authors argue that other critical operations, such as neural architecture search and query cardinality estimation, share similar properties. They propose modeling these problems as instances of tensor completion, where each variable corresponds to a mode of the tensor, and aim to identify missing entries by starting from a small sample of observed ones. To achieve this, they evaluate existing state-of-the-art tensor completion techniques, introducing domain-inspired adaptations and an ensemble approach that achieves state-of-the-art performance. The authors extensively evaluate methods on datasets generated for hyperparameter optimization, neural architecture search, and query cardinality estimation, demonstrating the effectiveness of tensor completion as a tool for automating data science pipelines.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper is all about making it easier to do certain tasks in data science, like finding the best settings for a model or estimating how many results a query will return. These tasks are hard because there are so many possibilities to try, and it takes a long time and lots of computer power to find the right one. The authors think that these problems are similar to each other, so they propose a new way of solving them all at once, using something called tensor completion. They tested this approach on some examples and found that it worked really well.

Keywords

» Artificial intelligence  » Hyperparameter  » Optimization