Summary of Automating Data Science Pipelines with Tensor Completion, by Shaan Pakala et al.
Automating Data Science Pipelines with Tensor Completion
by Shaan Pakala, Bryce Graw, Dawon Ahn, Tam Dinh, Mehnaz Tabassum Mahin, Vassilis Tsotras, Jia Chen, Evangelos E. Papalexakis
First submitted to arxiv on: 8 Oct 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: None
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper tackles the challenge of optimizing hyperparameters in data science pipelines, which typically requires computationally expensive searches through vast spaces. The authors argue that other critical operations, such as neural architecture search and query cardinality estimation, share similar properties. They propose modeling these problems as instances of tensor completion, where each variable corresponds to a mode of the tensor, and aim to identify missing entries by starting from a small sample of observed ones. To achieve this, they evaluate existing state-of-the-art tensor completion techniques, introducing domain-inspired adaptations and an ensemble approach that achieves state-of-the-art performance. The authors extensively evaluate methods on datasets generated for hyperparameter optimization, neural architecture search, and query cardinality estimation, demonstrating the effectiveness of tensor completion as a tool for automating data science pipelines. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper is all about making it easier to do certain tasks in data science, like finding the best settings for a model or estimating how many results a query will return. These tasks are hard because there are so many possibilities to try, and it takes a long time and lots of computer power to find the right one. The authors think that these problems are similar to each other, so they propose a new way of solving them all at once, using something called tensor completion. They tested this approach on some examples and found that it worked really well. |
Keywords
» Artificial intelligence » Hyperparameter » Optimization