Loading Now

Summary of Extracting Clean and Balanced Subset For Noisy Long-tailed Classification, by Zhuo Li et al.


Extracting Clean and Balanced Subset for Noisy Long-tailed Classification

by Zhuo Li, He Zhao, Zhen Li, Tongliang Liu, Dandan Guo, Xiang Wan

First submitted to arxiv on: 10 Apr 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: None

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper proposes a novel approach to address the joint issue of class imbalance and label noise in real-world datasets. The authors develop a pseudo-labeling method that uses class prototypes from the perspective of distribution matching, which can be solved using optimal transport (OT). This method reduces side-effects of noisy and long-tailed data simultaneously by setting a manually-specific probability measure and using a learned transport plan to pseudo-label training samples. The authors also introduce a simple filter criteria combining observed labels and pseudo labels to obtain a more balanced and less noisy subset for robust model training. Experimental results demonstrate that this method can effectively extract class-balanced subsets with clean labels, leading to performance gains in long-tailed classification with label noise.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper finds a way to fix two big problems in data sets: when some classes have much fewer examples than others, and when the information about what’s in each example is wrong. They make a new method that looks at how the different classes are spread out and uses that to decide which examples are good and which ones are bad. This helps them get rid of the noise (wrong information) and balance out the data so it’s fair for all classes. They test their method on many different kinds of data and show that it makes a big difference in how well machines can learn from this data.

Keywords

* Artificial intelligence  * Classification  * Probability