Summary of Pairwise Similarity Distribution Clustering For Noisy Label Learning, by Sihan Bai
Pairwise Similarity Distribution Clustering for Noisy Label Learning
by Sihan Bai
First submitted to arxiv on: 2 Apr 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Computer Vision and Pattern Recognition (cs.CV)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper proposes a simple yet effective sample selection algorithm called Pairwise Similarity Distribution Clustering (PSDC) to deal with noisy labels in deep neural networks. PSDC divides training samples into one clean set and another noisy set, which can be used to further train networks for different downstream tasks using semi-supervised learning regimes. The algorithm models the similarity distribution between sample pairs belonging to the same noisy cluster using a Gaussian Mixture Model (GMM), allowing each sample to be confidently classified as clean or noisy. Even under severe label noise rates, PSDC has been shown to be robust in judging label confidence both theoretically and practically. Experimental results on benchmark datasets such as CIFAR-10, CIFAR-100, and Clothing1M demonstrate significant improvements over state-of-the-art methods. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper helps us train better artificial intelligence models by dealing with noisy labels. Imagine you’re trying to teach a child by showing them many pictures of different animals, but some of the pictures are wrong. How can we make sure the child learns from the correct pictures? This paper proposes a simple and effective way to do just that, called PSDC (Pairwise Similarity Distribution Clustering). It works by grouping similar pictures together and separating the noisy ones. Even when there’s a lot of noise, this method is good at figuring out which pictures are correct. The results show that it can help improve AI models on various tasks. |
Keywords
» Artificial intelligence » Clustering » Mixture model » Semi supervised