Summary of Boosting Sisso Performance on Small Sample Datasets by Using Random Forests Prescreening For Complex Feature Selection, By Xiaolin Jiang et al.
Boosting SISSO Performance on Small Sample Datasets by Using Random Forests Prescreening for Complex Feature Selection
by Xiaolin Jiang, Guanqi Liu, Jiaying Xie, Zhenpeng Hu
First submitted to arxiv on: 28 Sep 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Materials Science (cond-mat.mtrl-sci); Data Analysis, Statistics and Probability (physics.data-an)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The proposed RF-SISSO algorithm combines Random Forests (RF) with Sure Independence Screening and Sparsifying Operator (SISSO) to improve material descriptor extraction from large datasets in materials science. SISSO requires storing the entire expression space, limiting its performance in complex problems. The RF-SISSO algorithm uses prescreening to capture non-linear relationships, enhance feature selection, and boost accuracy and efficiency on regression and classification tasks. Experimental results show that RF-SISSO maintains high accuracy (above 0.9) across all training sample sizes and significantly enhances regression efficiency, especially with smaller sample sizes. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary RF-SISSO is a new way to find patterns in big datasets about materials. This helps scientists discover new materials faster and more efficiently. The method uses two techniques together: Random Forests (RF) and Sure Independence Screening and Sparsifying Operator (SISSO). SISSO has some limitations, but RF can help fix those problems. Scientists tested this new way on 299 materials and found that it works really well. It gets the right answer most of the time and does it much faster than before. |
Keywords
» Artificial intelligence » Classification » Feature selection » Regression