Loading Now

Summary of Automated Data Processing and Feature Engineering For Deep Learning and Big Data Applications: a Survey, by Alhassan Mumuni and Fuseini Mumuni


Automated data processing and feature engineering for deep learning and big data applications: a survey

by Alhassan Mumuni, Fuseini Mumuni

First submitted to arxiv on: 18 Mar 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Artificial Intelligence (cs.AI); Databases (cs.DB)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The proposed work reviews approaches for automating data processing tasks in deep learning pipelines, focusing on automated data preprocessing, data augmentation, feature engineering, and end-to-end automation using AutoML techniques. The study explores various methods for data cleaning, labeling, imputation, encoding, generation, and selection, as well as the use of generative AI to synthesize new data. By automating these tasks, researchers can efficiently process large datasets, optimize machine learning pipelines, and enhance Big Data applications.
Low GrooveSquid.com (original content) Low Difficulty Summary
Automated artificial intelligence (AI) aims to design algorithms that learn directly from data. This approach has achieved impressive results in supervised deep learning. However, not all data processing tasks have been automated. Researchers are now working on automating these tasks using special techniques called AutoML. The goal is to use large volumes of complex data for machine learning and big data applications. End-to-end automated systems can take raw data and transform it into useful features for Big Data tasks.

Keywords

* Artificial intelligence  * Data augmentation  * Deep learning  * Feature engineering  * Machine learning  * Supervised