Loading Now

Summary of On the Performance Of Imputation Techniques For Missing Values on Healthcare Datasets, by Luke Oluwaseye Joel and Wesley Doorsamy and Babu Sena Paul


On the Performance of Imputation Techniques for Missing Values on Healthcare Datasets

by Luke Oluwaseye Joel, Wesley Doorsamy, Babu Sena Paul

First submitted to arxiv on: 13 Mar 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The paper compares the performance of seven imputation techniques on three healthcare datasets. The techniques are Mean imputation, Median Imputation, Last Observation carried Forward (LOCF) imputation, K-Nearest Neighbor (KNN) imputation, Interpolation imputation, Missforest imputation, and Multiple imputation by Chained Equations (MICE). Some missing values were introduced into the dataset, and each technique was used to impute these values. The performance of the techniques was evaluated using root mean squared error (RMSE) and mean absolute error (MAE). The results show that Missforest imputation performs the best followed by MICE imputation. Additionally, the paper investigates whether feature selection should be performed before imputation or vice versa, using metrics such as recall, precision, f1-score, and accuracy.
Low GrooveSquid.com (original content) Low Difficulty Summary
The study compares different ways to fill in missing values in healthcare data. This is important because many machine learning models don’t work well when there are missing values. The researchers tested seven different methods for filling in the missing values on three sets of data. They found that two specific methods, Missforest and MICE, worked best. They also looked at whether it’s better to fill in the missing values first or remove some features from the data before doing so.

Keywords

* Artificial intelligence  * F1 score  * Feature selection  * Machine learning  * Mae  * Nearest neighbor  * Precision  * Recall