Summary of Improving Uncertainty Sampling with Bell Curve Weight Function, by Zan-kai Chong et al.
Improving Uncertainty Sampling with Bell Curve Weight Function
by Zan-Kai Chong, Hiroyuki Ohsaki, Bok-Min Goi
First submitted to arxiv on: 3 Mar 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: None
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This research paper proposes a new active learning method called bell curve sampling to improve the efficiency of supervised learning by using fewer labelled instances than traditional passive learning. The current approach is effective but can be costly when acquiring labelled instances is expensive, such as in cases like spam mail detection. Uncertainty sampling, an existing active learning method, queries labels for instances with predicted probabilities near 0.5, which can be affected by the area of unpredictable responses (AUR) and dataset nature. The authors develop bell curve sampling to address these limitations, using a weight function centred at p=0.5 to select instances in the uncertainty region most of the time. Simulation results show that bell curve sampling outperforms uncertainty sampling and passive learning in datasets with different natures and AUR. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This research paper finds ways to make machine learning more efficient. Right now, we have to look at a lot of unlabelled data to train a model, which can be time-consuming and costly. For example, identifying spam emails from thousands of regular emails can take a long time. The authors propose a new way called bell curve sampling to get the most out of our labelled data. They compared their method with existing methods and found that it works better in different types of datasets. |
Keywords
* Artificial intelligence * Active learning * Machine learning * Supervised