Summary of Statistical Undersampling with Mutual Information and Support Points, by Alex Mak et al.
Statistical Undersampling with Mutual Information and Support Points
by Alex Mak, Shubham Sahoo, Shivani Pandey, Yidan Yue, Linglong Kong
First submitted to arxiv on: 19 Dec 2024
Categories
- Main: Machine Learning (stat.ML)
- Secondary: Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper proposes two novel approaches to undersample large datasets for classification tasks, addressing the challenges posed by class imbalance and distributional differences. The introduced methods, mutual information-based stratified simple random sampling and support points optimization, prioritize representative data selection while minimizing information loss. Empirical results demonstrate that these methods outperform traditional techniques, achieving higher balanced classification accuracy across multiple tasks. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper helps solve a big problem in machine learning: when some classes are much smaller than others, it’s hard to train good models. The authors come up with two new ways to fix this issue. They want to keep the most important data and throw away the rest. It works! Their methods do better than what we’re used to doing, making it easier to predict which class something belongs to. |
Keywords
» Artificial intelligence » Classification » Machine learning » Optimization