Summary of Data Pruning in Generative Diffusion Models, by Rania Briq et al.

Data Pruning in Generative Diffusion Models

by Rania Briq, Jiangtao Wang, Stefan Kesselheim

First submitted to arxiv on: 19 Nov 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary This paper investigates the application of data pruning techniques to generative diffusion models, with the goal of improving their accuracy. Contrary to intuition, the authors find that eliminating redundant or noisy data can be beneficial, especially when done strategically. They experiment with several pruning methods, including recent state-of-the-art approaches, and evaluate them on CelebA-HQ and ImageNet datasets. Surprisingly, a simple clustering method outperforms more complex and computationally demanding techniques. The authors also demonstrate how clustering can be used to balance skewed datasets in an unsupervised manner, allowing for fair sampling of underrepresented populations in the data distribution.
Low	GrooveSquid.com (original content)	Low Difficulty Summary Generative models are designed to estimate the underlying distribution of data. So, it’s natural to think that they would benefit from larger datasets. But what if we could trim down these datasets and get rid of some unnecessary information? This paper explores an idea called “data pruning” – identifying the most important parts of a dataset and getting rid of the rest. The researchers tested different methods for doing this with generative models, and found that it can actually make them work better. They also discovered that a simple method called clustering is surprisingly effective at finding what’s important and what’s not. This could have big implications for how we use generative models to create new images or videos that are representative of underrepresented groups.

Keywords

* Artificial intelligence * Clustering * Pruning * Unsupervised

Data Pruning in Generative Diffusion Models

by Rania Briq, Jiangtao Wang, Stefan Kesselheim

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Transformer Neural Processes – Kernel Regression, by Daniel Jenson et al.

Summary of Unlocking State-tracking in Linear Rnns Through Negative Eigenvalues, by Riccardo Grazzi et al.

Related Posts