Summary of On the Performance Of Imputation Techniques For Missing Values on Healthcare Datasets, by Luke Oluwaseye Joel and Wesley Doorsamy and Babu Sena Paul

On the Performance of Imputation Techniques for Missing Values on Healthcare Datasets

by Luke Oluwaseye Joel, Wesley Doorsamy, Babu Sena Paul

First submitted to arxiv on: 13 Mar 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The paper compares the performance of seven imputation techniques on three healthcare datasets. The techniques are Mean imputation, Median Imputation, Last Observation carried Forward (LOCF) imputation, K-Nearest Neighbor (KNN) imputation, Interpolation imputation, Missforest imputation, and Multiple imputation by Chained Equations (MICE). Some missing values were introduced into the dataset, and each technique was used to impute these values. The performance of the techniques was evaluated using root mean squared error (RMSE) and mean absolute error (MAE). The results show that Missforest imputation performs the best followed by MICE imputation. Additionally, the paper investigates whether feature selection should be performed before imputation or vice versa, using metrics such as recall, precision, f1-score, and accuracy.
Low	GrooveSquid.com (original content)	Low Difficulty Summary The study compares different ways to fill in missing values in healthcare data. This is important because many machine learning models don’t work well when there are missing values. The researchers tested seven different methods for filling in the missing values on three sets of data. They found that two specific methods, Missforest and MICE, worked best. They also looked at whether it’s better to fill in the missing values first or remove some features from the data before doing so.

Keywords

* Artificial intelligence * F1 score * Feature selection * Machine learning * Mae * Nearest neighbor * Precision * Recall

On the Performance of Imputation Techniques for Missing Values on Healthcare Datasets

by Luke Oluwaseye Joel, Wesley Doorsamy, Babu Sena Paul

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Cyclical Log Annealing As a Learning Rate Scheduler, by Philip Naveen

Summary of Kernel Alignment For Unsupervised Feature Selection Via Matrix Factorization, by Ziyuan Lin and Deanna Needell

Related Posts