Loading Now

Summary of Diffusion Self-distillation For Zero-shot Customized Image Generation, by Shengqu Cai and Eric Chan and Yunzhi Zhang and Leonidas Guibas and Jiajun Wu and Gordon Wetzstein


Diffusion Self-Distillation for Zero-Shot Customized Image Generation

by Shengqu Cai, Eric Chan, Yunzhi Zhang, Leonidas Guibas, Jiajun Wu, Gordon Wetzstein

First submitted to arxiv on: 27 Nov 2024

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
Text-to-image diffusion models have achieved impressive results, but they can be frustrating for artists seeking fine-grained control. One use case is generating images of specific instances in novel contexts, known as “identity-preserving generation”. This setting and others (e.g., relighting) are well-suited for image+text-conditional generative models. However, there is a lack of high-quality paired data to train such models directly. The authors propose Diffusion Self-Distillation, a method using pre-trained text-to-image models to generate their own dataset for text-conditioned image-to-image tasks. They leverage the model’s in-context generation ability to create images and curate a large paired dataset with the help of a Visual-Language Model. Then, they fine-tune the text-to-image model into a text+image-to-image model using the curated dataset. The authors demonstrate that Diffusion Self-Distillation outperforms existing zero-shot methods and is competitive with per-instance tuning techniques on various identity-preservation generation tasks without requiring test-time optimization.
Low GrooveSquid.com (original content) Low Difficulty Summary
Imagine you have a special kind of computer program that can create images based on text descriptions. This program is very good at making new images, but it’s hard to control exactly what the image looks like. Some artists want to use this program to make specific images, like pictures of people in different outfits or with different backgrounds. To do this, they need a lot more data for the program to learn from. The authors of this paper came up with an idea called Diffusion Self-Distillation that helps the program create its own training data. This makes it easier and faster to make new images that meet specific requirements.

Keywords

» Artificial intelligence  » Diffusion  » Distillation  » Language model  » Optimization  » Zero shot