Summary of How Much Is a Noisy Image Worth? Data Scaling Laws For Ambient Diffusion, by Giannis Daras and Yeshwanth Cherapanamjeri and Constantinos Daskalakis
How much is a noisy image worth? Data Scaling Laws for Ambient Diffusion
by Giannis Daras, Yeshwanth Cherapanamjeri, Constantinos Daskalakis
First submitted to arxiv on: 5 Nov 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Computer Vision and Pattern Recognition (cs.CV)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper investigates the quality of generative models and their dependence on the quality of training data. The authors study the performance of diffusion models trained on corrupted datasets versus those trained on clean data, showing that even with large-scale datasets, it is impossible to match the performance of models trained on clean data when only using noisy data. However, they find that combining a small amount of clean data with a larger set of noisy data can achieve near state-of-the-art performance. The paper provides theoretical evidence for these findings by developing novel sample complexity bounds for learning from Gaussian Mixtures with heterogeneous variances. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This research looks at how good generative models are and how they depend on the quality of their training data. The scientists found that even when they had lots of data, if it was all noisy, they couldn’t make the model as good as one trained on clean data. But they did find that mixing a little bit of clean data with lots of noisy data could make the model almost as good as one trained solely on clean data. |
Keywords
» Artificial intelligence » Diffusion