Loading Now

Summary of Slight Corruption in Pre-training Data Makes Better Diffusion Models, by Hao Chen et al.


Slight Corruption in Pre-training Data Makes Better Diffusion Models

by Hao Chen, Yujin Han, Diganta Misra, Xiang Li, Kai Hu, Difan Zou, Masashi Sugiyama, Jindong Wang, Bhiksha Raj

First submitted to arxiv on: 30 May 2024

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper presents a comprehensive study on the impact of corruption in pre-training data for diffusion models (DMs). DMs have shown impressive capabilities in generating realistic images, audios, and videos. Pre-training on large-scale datasets with paired data and conditions can significantly benefit their performance. However, these datasets often contain corrupted pairs where conditions do not accurately describe the data. The study synthetically corrupts ImageNet-1K and CC3M to pre-train and evaluate over 50 conditional DMs. Empirical findings reveal that slight corruption in pre-training can enhance the quality, diversity, and fidelity of generated images across different DMs during both pre-training and downstream adaptation stages. Theoretically, a Gaussian mixture model is considered to prove that slight corruption leads to higher entropy and reduced 2-Wasserstein distance. A simple method called condition embedding perturbations (CEP) is proposed to improve the training of DMs on practical datasets. CEP improves performance in both pre-training and downstream tasks.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper studies how corrupted data affects diffusion models that generate images, videos, and music. These models do well when trained on big datasets with matching information like image-text pairs. However, these datasets often contain mistakes where the information doesn’t match the picture. The study tries to fix this by corrupting two popular datasets, ImageNet-1K and CC3M, to see how it affects over 50 different model versions. Surprisingly, a little corruption can make the models better at generating realistic pictures. The paper also explains why this happens using math and shows that a simple trick called CEP (condition embedding perturbations) can improve these models even more.

Keywords

* Artificial intelligence  * Diffusion  * Embedding  * Mixture model