Summary of Most Influential Subset Selection: Challenges, Promises, and Beyond, by Yuzheng Hu et al.

Most Influential Subset Selection: Challenges, Promises, and Beyond

by Yuzheng Hu, Pingbang Hu, Han Zhao, Jiaqi W. Ma

First submitted to arxiv on: 25 Sep 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary A novel approach to attributing machine learning model behaviors to their training data is proposed in this paper. The study focuses on the Most Influential Subset Selection (MISS) problem, which aims to identify a subset of training samples with the greatest collective influence. A comprehensive analysis of prevailing MISS approaches reveals strengths and weaknesses, highlighting that influence-based greedy heuristics can fail even in linear regression due to errors in influence function calculations and non-additive collective influence structures. An adaptive version of these heuristics is demonstrated to effectively capture interactions among samples and address issues. Experimental results on real-world datasets support theoretical findings and show adaptivity’s merit extends to classification tasks and non-linear neural networks. The paper questions the use of additive metrics like the Linear Datamodeling Score, emphasizing the trade-off between performance and computational efficiency.
Low	GrooveSquid.com (original content)	Low Difficulty Summary Machine learning models are trained using data, but it’s hard to figure out which parts of that data make the model behave in a certain way. This paper explores how to identify important groups of training samples that have a big impact on the model’s behavior. The researchers looked at different ways people have tried to solve this problem and found some approaches don’t work well, especially when dealing with complex tasks like image classification. They also showed that a new approach can help address these issues by looking at how different parts of the data interact with each other. This research has implications for how we use machine learning in real-world applications.

Keywords

* Artificial intelligence * Classification * Image classification * Linear regression * Machine learning

Most Influential Subset Selection: Challenges, Promises, and Beyond

by Yuzheng Hu, Pingbang Hu, Han Zhao, Jiaqi W. Ma

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Self-supervised Pretraining For Cardiovascular Magnetic Resonance Cine Segmentation, by Rob A. J. De Mooij et al.

Summary of Malpolon: a Framework For Deep Species Distribution Modeling, by Theo Larcher et al.

Related Posts