Loading Now

Summary of Misfeat: Feature Selection For Subgroups with Systematic Missing Data, by Bar Genossar et al.


MISFEAT: Feature Selection for Subgroups with Systematic Missing Data

by Bar Genossar, Thinh On, Md. Mouinul Islam, Ben Eliav, Senjuti Basu Roy, Avigdor Gal

First submitted to arxiv on: 9 Dec 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Databases (cs.DB); Machine Learning (stat.ML)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The paper addresses the problem of selecting features for datasets with subgroup structure, where each subgroup has its own dominant set of features. The challenge is to handle systematic missing data, which occurs when some feature values are missing for all tuples within a subgroup. To address this issue, the authors propose a model based on heterogeneous graph neural networks to identify interdependencies between feature-subgroup-target variable connections as a multiplex graph. This allows information propagation between nodes and facilitates inferring missing mutual information values. The authors also tackle scalability challenges by proposing principled solutions for training. The efficacy of the proposed solutions is demonstrated through an extensive empirical evaluation, showing both qualitative and running-time advantages.
Low GrooveSquid.com (original content) Low Difficulty Summary
This research paper is about finding the most important features in a dataset that has different groups or subgroups within it. For example, a dataset might have information about people from different age groups, each with its own set of characteristics. The problem is that some data points might be missing for entire groups, which makes it harder to analyze the data. To solve this issue, the authors propose a new way of looking at the relationships between features and groupings using something called graph neural networks. This helps fill in the gaps where data is missing and allows us to better understand the connections between different features and groups.

Keywords

» Artificial intelligence