Loading Now

Summary of Error-controlled Non-additive Interaction Discovery in Machine Learning Models, by Winston Chen et al.


Error-controlled non-additive interaction discovery in machine learning models

by Winston Chen, Yifan Jiang, William Stafford Noble, Yang Young Lu

First submitted to arxiv on: 30 Aug 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Applications (stat.AP); Machine Learning (stat.ML)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
Machine learning (ML) models excel at detecting complex patterns, but their lack of interpretability hinders use in fields like healthcare and finance. To address this, interpretable ML methods have been developed to explain feature influence on model predictions. However, these methods typically focus on individual feature importance, overlooking complex interactions between features that ML models can capture. Recent efforts aim to extend these methods to discover feature interactions but struggle with robustness and error control under data perturbations. This study introduces Diamond, a novel method for trustworthy feature interaction discovery, integrating the model-X knockoffs framework to control the false discovery rate (FDR). Diamond refines existing interaction importance measures using non-additivity distillation, ensuring FDR control is maintained. This approach addresses off-the-shelf interaction measure limitations that can lead to inaccurate discoveries. Diamond’s applicability spans various ML models, including deep neural networks and transformer models. Empirical evaluations on simulated and real datasets across biomedical studies demonstrate Diamond’s utility in enabling reliable data-driven scientific discoveries.
Low GrooveSquid.com (original content) Low Difficulty Summary
Machine learning (ML) models are powerful tools for finding patterns in data, but they’re hard to understand. This makes it difficult to use them in important areas like healthcare and finance. To fix this, scientists have developed ways to explain how features affect ML model predictions. But these methods usually focus on individual feature importance, missing the complex interactions between features that ML models can capture. Some recent efforts tried to improve these methods by finding feature interactions, but they struggled with making sure their results were accurate and reliable. This study presents Diamond, a new method for discovering feature interactions in a trustworthy way. Diamond uses a special framework called model-X knockoffs to control the number of mistakes it makes (false discovery rate or FDR). Diamond also refines existing methods to make sure its results are accurate. This approach helps fix the problems with previous methods that could lead to incorrect discoveries. Diamond can be used with many different types of ML models, including those like deep neural networks and transformer models. The study tested Diamond on both made-up and real datasets from biomedical studies and showed that it works well in finding reliable scientific discoveries.

Keywords

» Artificial intelligence  » Distillation  » Machine learning  » Transformer