Summary of Pairwise Similarity Distribution Clustering For Noisy Label Learning, by Sihan Bai

Pairwise Similarity Distribution Clustering for Noisy Label Learning

by Sihan Bai

First submitted to arxiv on: 2 Apr 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary This paper proposes a simple yet effective sample selection algorithm called Pairwise Similarity Distribution Clustering (PSDC) to deal with noisy labels in deep neural networks. PSDC divides training samples into one clean set and another noisy set, which can be used to further train networks for different downstream tasks using semi-supervised learning regimes. The algorithm models the similarity distribution between sample pairs belonging to the same noisy cluster using a Gaussian Mixture Model (GMM), allowing each sample to be confidently classified as clean or noisy. Even under severe label noise rates, PSDC has been shown to be robust in judging label confidence both theoretically and practically. Experimental results on benchmark datasets such as CIFAR-10, CIFAR-100, and Clothing1M demonstrate significant improvements over state-of-the-art methods.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This paper helps us train better artificial intelligence models by dealing with noisy labels. Imagine you’re trying to teach a child by showing them many pictures of different animals, but some of the pictures are wrong. How can we make sure the child learns from the correct pictures? This paper proposes a simple and effective way to do just that, called PSDC (Pairwise Similarity Distribution Clustering). It works by grouping similar pictures together and separating the noisy ones. Even when there’s a lot of noise, this method is good at figuring out which pictures are correct. The results show that it can help improve AI models on various tasks.

Keywords

* Artificial intelligence * Clustering * Mixture model * Semi supervised

Pairwise Similarity Distribution Clustering for Noisy Label Learning

by Sihan Bai

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of When Does Subagging Work?, by Christos Revelas et al.

Summary of Accelerating Transformer Pre-training with 2:4 Sparsity, by Yuezhou Hu et al.

Related Posts