Loading Now

Summary of Mislabeled Examples Detection Viewed As Probing Machine Learning Models: Concepts, Survey and Extensive Benchmark, by Thomas George et al.


Mislabeled examples detection viewed as probing machine learning models: concepts, survey and extensive benchmark

by Thomas George, Pierre Nodet, Alexis Bondu, Vincent Lemaire

First submitted to arxiv on: 21 Oct 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: None

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The proposed modular framework formalizes various mislabeled detection methods, leveraging core principles that can be applied to different machine learning models and datasets. The framework consists of four building blocks, and a Python library demonstrates its implementation. This framework focuses on classifier-agnostic concepts, allowing for adaptation to non-deep classifiers for tabular data. The authors benchmark existing methods on both artificial (Completely At Random) and realistic (Not At Random) labeling noise from various tasks with imperfect labeling rules, providing new insights into the limitations of existing approaches in this setup.
Low GrooveSquid.com (original content) Low Difficulty Summary
Mislabeled examples are a big problem in machine learning datasets. Researchers want to find ways to automatically detect these mistakes. A team has developed a modular framework that can be used to detect mislabeled data. This framework works by using four basic principles, which can be applied to different types of machine learning models and data. The researchers also created a Python library to show how the framework works in practice. They tested their approach on artificial and real-world labeling noise from different tasks with imperfect labeling rules.

Keywords

* Artificial intelligence  * Machine learning