Loading Now

Summary of Retraining with Predicted Hard Labels Provably Increases Model Accuracy, by Rudrajit Das et al.


Retraining with Predicted Hard Labels Provably Increases Model Accuracy

by Rudrajit Das, Inderjit S. Dhillon, Alessandro Epasto, Adel Javanmard, Jieming Mao, Vahab Mirrokni, Sujay Sanghavi, Peilin Zhong

First submitted to arxiv on: 17 Jun 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Cryptography and Security (cs.CR); Machine Learning (stat.ML)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The paper investigates the theoretical benefits of retraining a model using its own predicted labels, specifically in scenarios where labels are noisy and linearly separable. The authors prove that retraining can improve population accuracy when initially trained on noisy labels. This phenomenon has implications for training models with local label differential privacy (DP), which involves noisy labels. The paper demonstrates how consensus-based retraining selectively improves DP training without compromising privacy, achieving a 6.4% accuracy boost in the case of ResNet-18 on CIFAR-100 with epsilon=3 label DP.
Low GrooveSquid.com (original content) Low Difficulty Summary
The paper explores why retraining a model using its own predicted labels can improve performance when trained with noisy labels. The authors show that this technique works by proving a theoretical result about improving population accuracy. This has important implications for training models with local label differential privacy, which requires noisy labels. The study demonstrates how to use consensus-based retraining to get better results without sacrificing privacy.

Keywords

* Artificial intelligence  * Resnet