Loading Now

Summary of Navigating Data Corruption in Machine Learning: Balancing Quality, Quantity, and Imputation Strategies, by Qi Liu and Wanjing Ma


by Qi Liu, Wanjing Ma

First submitted to arxiv on: 24 Dec 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper explores how real-world machine learning models are affected by corrupted data, such as missing or noisy values. The study focuses on two applications: natural language processing for text classification (NLP-SL) and optimizing traffic signals using deep reinforcement learning (Signal-RL). The authors investigate the impact of varying levels of data corruption on model performance, test the effectiveness of imputation methods to fix corrupt data, and evaluate whether increasing dataset size can help alleviate the issue.
Low GrooveSquid.com (original content) Low Difficulty Summary
The paper studies how corrupted data affects machine learning models. Researchers looked at two cases: natural language processing for text classification (NLP-SL) and optimizing traffic signals using deep reinforcement learning (Signal-RL). They found out that corrupting data makes models perform worse, and tried to fix this by using special methods to make the data better. They also checked if having more training data helps.

Keywords

» Artificial intelligence  » Machine learning  » Natural language processing  » Nlp  » Reinforcement learning  » Text classification