Summary of Data-efficient Learning Via Clustering-based Sensitivity Sampling: Foundation Models and Beyond, by Kyriakos Axiotis et al.

Data-Efficient Learning via Clustering-Based Sensitivity Sampling: Foundation Models and Beyond

by Kyriakos Axiotis, Vincent Cohen-Addad, Monika Henzinger, Sammy Jerome, Vahab Mirrokni, David Saulpic, David Woodruff, Michael Wunder

First submitted to arxiv on: 27 Feb 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The paper proposes a new approach to solving the data selection problem, which aims to efficiently train machine learning models by selecting a representative subset of data. The method combines k-means clustering with sensitivity sampling, leveraging an embedding representation of the data where the model loss is Hölder continuous. This allows for the selection of “typical” elements that accurately represent the average loss of the entire dataset, with provable guarantees on the accuracy and robustness of the selected subset.
Low	GrooveSquid.com (original content)	Low Difficulty Summary Imagine you’re trying to train a machine learning model, but you don’t have all the data. The data selection problem is about finding a small piece of that data that can teach your model most of what it needs to know. This paper introduces a new way to solve this problem using a combination of clustering and sampling techniques. By looking at how well each piece of data fits into different groups, we can pick out the most important bits and use them to train our model. This approach is useful for situations where you don’t have access to all the data or when you want to speed up training.

Keywords

* Artificial intelligence * Clustering * Embedding * K means * Machine learning

Data-Efficient Learning via Clustering-Based Sensitivity Sampling: Foundation Models and Beyond

by Kyriakos Axiotis, Vincent Cohen-Addad, Monika Henzinger, Sammy Jerome, Vahab Mirrokni, David Saulpic, David Woodruff, Michael Wunder

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Measuring Vision-language Stem Skills Of Neural Models, by Jianhao Shen et al.

Summary of Bit Distribution Study and Implementation Of Spatial Quality Map in the Jpeg-ai Standardization, by Panqi Jia et al.

Related Posts