Loading Now

Summary of Icd Codes Are Insufficient to Create Datasets For Machine Learning: An Evaluation Using All Of Us Data For Coccidioidomycosis and Myocardial Infarction, by Abigail E. Whitlock et al.


ICD Codes are Insufficient to Create Datasets for Machine Learning: An Evaluation Using All of Us Data for Coccidioidomycosis and Myocardial Infarction

by Abigail E. Whitlock, Gondy Leroy, Fariba M. Donovan, John N. Galgiani

First submitted to arxiv on: 10 Jul 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: None

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
Machine learning (ML) datasets are crucial for developing new models, but creating these datasets is challenging. In medicine, ICD codes are often used to build ML datasets, but they’re primarily designed for billing purposes. This paper explores the suitability of ICD codes for training ML models. The authors compared patient cohorts using ICD codes with those identified via serological confirmation for two diseases: Valley fever (coccidioidomycosis) and myocardial infarction. They found significant discrepancies between the groups, with limited overlap. The results highlight the need to reassess how we create ML datasets in medicine.
Low GrooveSquid.com (original content) Low Difficulty Summary
Machine learning is used to develop new models in medicine. But creating these models requires big datasets. Right now, doctors use ICD codes to build these datasets. However, these codes are meant for billing purposes, not for building model datasets. This research looks at whether ICD codes are suitable for training machine learning models. The scientists compared two groups of patients with different diseases using ICD codes and another way to confirm the diagnosis. They found that the two groups were quite different, which is important information.

Keywords

» Artificial intelligence  » Machine learning