Loading Now

Summary of Boosting Sisso Performance on Small Sample Datasets by Using Random Forests Prescreening For Complex Feature Selection, By Xiaolin Jiang et al.


Boosting SISSO Performance on Small Sample Datasets by Using Random Forests Prescreening for Complex Feature Selection

by Xiaolin Jiang, Guanqi Liu, Jiaying Xie, Zhenpeng Hu

First submitted to arxiv on: 28 Sep 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Materials Science (cond-mat.mtrl-sci); Data Analysis, Statistics and Probability (physics.data-an)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The proposed RF-SISSO algorithm combines Random Forests (RF) with Sure Independence Screening and Sparsifying Operator (SISSO) to improve material descriptor extraction from large datasets in materials science. SISSO requires storing the entire expression space, limiting its performance in complex problems. The RF-SISSO algorithm uses prescreening to capture non-linear relationships, enhance feature selection, and boost accuracy and efficiency on regression and classification tasks. Experimental results show that RF-SISSO maintains high accuracy (above 0.9) across all training sample sizes and significantly enhances regression efficiency, especially with smaller sample sizes.
Low GrooveSquid.com (original content) Low Difficulty Summary
RF-SISSO is a new way to find patterns in big datasets about materials. This helps scientists discover new materials faster and more efficiently. The method uses two techniques together: Random Forests (RF) and Sure Independence Screening and Sparsifying Operator (SISSO). SISSO has some limitations, but RF can help fix those problems. Scientists tested this new way on 299 materials and found that it works really well. It gets the right answer most of the time and does it much faster than before.

Keywords

» Artificial intelligence  » Classification  » Feature selection  » Regression