Loading Now

Summary of Pairwise Similarity Distribution Clustering For Noisy Label Learning, by Sihan Bai


Pairwise Similarity Distribution Clustering for Noisy Label Learning

by Sihan Bai

First submitted to arxiv on: 2 Apr 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Computer Vision and Pattern Recognition (cs.CV)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper proposes a simple yet effective sample selection algorithm called Pairwise Similarity Distribution Clustering (PSDC) to deal with noisy labels in deep neural networks. PSDC divides training samples into one clean set and another noisy set, which can be used to further train networks for different downstream tasks using semi-supervised learning regimes. The algorithm models the similarity distribution between sample pairs belonging to the same noisy cluster using a Gaussian Mixture Model (GMM), allowing each sample to be confidently classified as clean or noisy. Even under severe label noise rates, PSDC has been shown to be robust in judging label confidence both theoretically and practically. Experimental results on benchmark datasets such as CIFAR-10, CIFAR-100, and Clothing1M demonstrate significant improvements over state-of-the-art methods.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper helps us train better artificial intelligence models by dealing with noisy labels. Imagine you’re trying to teach a child by showing them many pictures of different animals, but some of the pictures are wrong. How can we make sure the child learns from the correct pictures? This paper proposes a simple and effective way to do just that, called PSDC (Pairwise Similarity Distribution Clustering). It works by grouping similar pictures together and separating the noisy ones. Even when there’s a lot of noise, this method is good at figuring out which pictures are correct. The results show that it can help improve AI models on various tasks.

Keywords

» Artificial intelligence  » Clustering  » Mixture model  » Semi supervised