Summary of Automated Data Processing and Feature Engineering For Deep Learning and Big Data Applications: a Survey, by Alhassan Mumuni and Fuseini Mumuni
Automated data processing and feature engineering for deep learning and big data applications: a survey
by Alhassan Mumuni, Fuseini Mumuni
First submitted to arxiv on: 18 Mar 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Artificial Intelligence (cs.AI); Databases (cs.DB)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The proposed work reviews approaches for automating data processing tasks in deep learning pipelines, focusing on automated data preprocessing, data augmentation, feature engineering, and end-to-end automation using AutoML techniques. The study explores various methods for data cleaning, labeling, imputation, encoding, generation, and selection, as well as the use of generative AI to synthesize new data. By automating these tasks, researchers can efficiently process large datasets, optimize machine learning pipelines, and enhance Big Data applications. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Automated artificial intelligence (AI) aims to design algorithms that learn directly from data. This approach has achieved impressive results in supervised deep learning. However, not all data processing tasks have been automated. Researchers are now working on automating these tasks using special techniques called AutoML. The goal is to use large volumes of complex data for machine learning and big data applications. End-to-end automated systems can take raw data and transform it into useful features for Big Data tasks. |
Keywords
* Artificial intelligence * Data augmentation * Deep learning * Feature engineering * Machine learning * Supervised