Loading Now

Summary of Self-improving Diffusion Models with Synthetic Data, by Sina Alemohammad et al.


Self-Improving Diffusion Models with Synthetic Data

by Sina Alemohammad, Ahmed Imtiaz Humayun, Shruti Agarwal, John Collomosse, Richard Baraniuk

First submitted to arxiv on: 29 Aug 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The paper addresses the issue of training generative models with synthetic data, which is becoming increasingly important as real data becomes scarce. The current approach of using past generation models to train new ones creates a self-consuming loop that degrades the quality and diversity of the synthetic data. This phenomenon is known as model autophagy disorder (MAD) and model collapse. To combat this issue, the authors propose Self-IMproving diffusion models with Synthetic data (SIMS), a training concept that uses self-synthesized data to provide negative guidance during the generation process. This helps steer the model’s generative process away from the non-ideal synthetic data manifold and towards the real data distribution. The paper demonstrates the effectiveness of SIMS, achieving new records on CIFAR-10 and ImageNet-64 generation, competitive results on FFHQ-64 and ImageNet-512, and the ability to adjust synthetic data distributions to match desired target distributions.
Low GrooveSquid.com (original content) Low Difficulty Summary
Generative models are getting bigger and need more training data. But real data is hard to find! So, people have been using fake data to train them instead. This works okay, but it creates a problem where the fake data gets worse and worse over time. To fix this, researchers created Self-IMproving diffusion models with Synthetic data (SIMS). It’s like having a special tool that helps keep the fake data from getting too bad. The scientists tested SIMS and found that it can make really good images and adjust the fake data to match what they want.

Keywords

» Artificial intelligence  » Diffusion  » Synthetic data