Summary of Simpro: a Simple Probabilistic Framework Towards Realistic Long-tailed Semi-supervised Learning, by Chaoqun Du et al.
SimPro: A Simple Probabilistic Framework Towards Realistic Long-Tailed Semi-Supervised Learning
by Chaoqun Du, Yizeng Han, Gao Huang
First submitted to arxiv on: 21 Feb 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Computer Vision and Pattern Recognition (cs.CV)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The proposed SimPro framework in this study offers a novel approach to semi-supervised learning, addressing imbalances in labeled data without relying on predefined assumptions about the class distribution of unlabeled data. By decoupling conditional and marginal class distributions using a probabilistic model and refining the expectation-maximization (EM) algorithm, SimPro achieves a closed-form solution for class distribution estimation during the maximization phase, leading to enhanced pseudo-labels in the expectation phase. The framework’s simplicity, theoretical guarantees, and state-of-the-art performance across diverse benchmarks make it an attractive option for practitioners. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Semi-supervised learning is trying to figure out how to use a small amount of labeled data and a lot of unlabeled data to train a machine learning model. Usually, this approach assumes that the class distribution of the unlabeled data is similar to the class distribution of the labeled data. However, this isn’t always the case. The researchers in this study propose a new way of doing semi-supervised learning that doesn’t make these assumptions. Instead, it uses a probabilistic model and an algorithm called expectation-maximization (EM) to learn about the class distributions. This approach is more flexible and can handle different types of data distributions. The results show that this method performs well across many different datasets. |
Keywords
* Artificial intelligence * Machine learning * Probabilistic model * Semi supervised