Summary of Distribution-aware Robust Learning From Long-tailed Data with Noisy Labels, by Jae Soon Baik et al.

Distribution-Aware Robust Learning from Long-Tailed Data with Noisy Labels

by Jae Soon Baik, In Young Yoon, Kun Hoon Kim, Jun Won Choi

First submitted to arxiv on: 23 Jul 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary Deep learning models have achieved impressive advancements across various fields using large, well-annotated datasets. However, real-world data often exhibit long-tailed distributions and label noise, significantly degrading generalization performance. Recent studies addressing these issues have focused on noisy sample selection methods that estimate the centroid of each class based on high-confidence samples within each target class. The proposed framework, Distribution-aware Sample Selection and Contrastive Learning (DaSC), introduces a Distribution-aware Class Centroid Estimation (DaCC) to generate enhanced class centroids. DaCC performs weighted averaging of features from all samples with weights determined by model predictions. Additionally, the method proposes a confidence-aware contrastive learning strategy to obtain balanced and robust representations. The training process categorizes samples into high-confidence and low-confidence samples, applying Semi-supervised Balanced Contrastive Loss (SBCL) using high-confidence samples, leveraging reliable label information to mitigate class bias. For low-confidence samples, the method computes Mixup-enhanced Instance Discrimination Loss (MIDL) to improve their representations in a self-supervised manner. Experimental results on CIFAR and real-world noisy-label datasets demonstrate the superior performance of DaSC compared to previous approaches.
Low	GrooveSquid.com (original content)	Low Difficulty Summary Deep neural networks have made great progress in various fields using big, labeled datasets. But in the real world, data often has uneven distributions and noisy labels, which makes generalization harder. To fix this, researchers have focused on choosing the best samples from each class. The problem is that these methods only use the training samples within each class to estimate the center of each class, making them vulnerable to uneven distributions and noisy labels. This study presents a new way to train models called Distribution-aware Sample Selection and Contrastive Learning (DaSC). DaSC has two main parts: it estimates the center of each class using all samples, not just training samples, and it uses a special type of learning that helps balance and improve representations. The method separates samples into those with high-confidence labels and those with low-confidence labels, using the first to train the model and the second to make it better in a self-supervised way. The results show that DaSC performs better than previous approaches on CIFAR and real-world datasets.

Keywords

* Artificial intelligence * Contrastive loss * Deep learning * Generalization * Self supervised * Semi supervised

Distribution-Aware Robust Learning from Long-Tailed Data with Noisy Labels

by Jae Soon Baik, In Young Yoon, Kun Hoon Kim, Jun Won Choi

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Imperfect Vision Encoders: Efficient and Robust Tuning For Vision-language Models, by Aristeidis Panos et al.

Summary of Wasserstein Distributionally Robust Shallow Convex Neural Networks, by Julien Pallage and Antoine Lesage-landry

Related Posts