Summary of On the Performance Of Imputation Techniques For Missing Values on Healthcare Datasets, by Luke Oluwaseye Joel and Wesley Doorsamy and Babu Sena Paul
On the Performance of Imputation Techniques for Missing Values on Healthcare Datasets
by Luke Oluwaseye Joel, Wesley Doorsamy, Babu Sena Paul
First submitted to arxiv on: 13 Mar 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The paper compares the performance of seven imputation techniques on three healthcare datasets. The techniques are Mean imputation, Median Imputation, Last Observation carried Forward (LOCF) imputation, K-Nearest Neighbor (KNN) imputation, Interpolation imputation, Missforest imputation, and Multiple imputation by Chained Equations (MICE). Some missing values were introduced into the dataset, and each technique was used to impute these values. The performance of the techniques was evaluated using root mean squared error (RMSE) and mean absolute error (MAE). The results show that Missforest imputation performs the best followed by MICE imputation. Additionally, the paper investigates whether feature selection should be performed before imputation or vice versa, using metrics such as recall, precision, f1-score, and accuracy. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary The study compares different ways to fill in missing values in healthcare data. This is important because many machine learning models don’t work well when there are missing values. The researchers tested seven different methods for filling in the missing values on three sets of data. They found that two specific methods, Missforest and MICE, worked best. They also looked at whether it’s better to fill in the missing values first or remove some features from the data before doing so. |
Keywords
* Artificial intelligence * F1 score * Feature selection * Machine learning * Mae * Nearest neighbor * Precision * Recall