Summary of Slight Corruption in Pre-training Data Makes Better Diffusion Models, by Hao Chen et al.
Slight Corruption in Pre-training Data Makes Better Diffusion Models
by Hao Chen, Yujin Han, Diganta Misra, Xiang Li, Kai Hu, Difan Zou, Masashi Sugiyama, Jindong Wang, Bhiksha Raj
First submitted to arxiv on: 30 May 2024
Categories
- Main: Computer Vision and Pattern Recognition (cs.CV)
- Secondary: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper presents a comprehensive study on the impact of corruption in pre-training data for diffusion models (DMs). DMs have shown impressive capabilities in generating realistic images, audios, and videos. Pre-training on large-scale datasets with paired data and conditions can significantly benefit their performance. However, these datasets often contain corrupted pairs where conditions do not accurately describe the data. The study synthetically corrupts ImageNet-1K and CC3M to pre-train and evaluate over 50 conditional DMs. Empirical findings reveal that slight corruption in pre-training can enhance the quality, diversity, and fidelity of generated images across different DMs during both pre-training and downstream adaptation stages. Theoretically, a Gaussian mixture model is considered to prove that slight corruption leads to higher entropy and reduced 2-Wasserstein distance. A simple method called condition embedding perturbations (CEP) is proposed to improve the training of DMs on practical datasets. CEP improves performance in both pre-training and downstream tasks. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper studies how corrupted data affects diffusion models that generate images, videos, and music. These models do well when trained on big datasets with matching information like image-text pairs. However, these datasets often contain mistakes where the information doesn’t match the picture. The study tries to fix this by corrupting two popular datasets, ImageNet-1K and CC3M, to see how it affects over 50 different model versions. Surprisingly, a little corruption can make the models better at generating realistic pictures. The paper also explains why this happens using math and shows that a simple trick called CEP (condition embedding perturbations) can improve these models even more. |
Keywords
* Artificial intelligence * Diffusion * Embedding * Mixture model