Summary of Prediction Instability in Machine Learning Ensembles, by Jeremy Kedziora
Prediction Instability in Machine Learning Ensembles
by Jeremy Kedziora
First submitted to arxiv on: 3 Jul 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: None
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper investigates the mathematical properties of machine learning ensembles, specifically aggregating predictions from multiple models. Despite their strong performance in applied problems, ensembles’ prediction instability is understudied and has significant consequences for safe and explainable use. The authors prove a theorem showing that any ensemble will exhibit one of three forms of prediction instability: ignoring agreement among underlying models, changing its mind when none have done so, or being manipulable through inclusion/exclusion of options. This highlights the need to balance benefits against risks in ensemble aggregation procedures. Furthermore, the paper shows that popular tree ensembles like random forest and XGBoost violate basic fairness properties, but this can be mitigated by using consistent models in asymptotic conditions. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Machine learning ensembles combine predictions from multiple models. Despite their success, we don’t know much about what makes them work or how to use them safely. This paper helps fill that gap by showing that any ensemble will have some problems with making predictions. These problems can cause the ensemble to ignore agreement among its individual models, change its mind without good reason, or be influenced by things it wouldn’t normally predict. To use ensembles safely and fairly, we need to balance their benefits against these risks. The paper also shows that popular types of ensembles don’t always follow basic rules of fairness. |
Keywords
» Artificial intelligence » Machine learning » Random forest » Xgboost