Loading Now

Summary of Infusion: Preventing Customized Text-to-image Diffusion From Overfitting, by Weili Zeng et al.


Infusion: Preventing Customized Text-to-Image Diffusion from Overfitting

by Weili Zeng, Yichao Yan, Qi Zhu, Zhuo Chen, Pengzhi Chu, Weiming Zhao, Xiaokang Yang

First submitted to arxiv on: 22 Apr 2024

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
Text-to-image (T2I) customization aims to create images matching specific textual descriptions, but existing works face a main challenge: concept overfitting. Our paper analyzes this issue, categorizing it into two types: concept-agnostic overfitting, which undermines non-customized concept knowledge, and concept-specific overfitting, confined to limited modalities like backgrounds or styles. To evaluate the degree of overfitting, we introduce two metrics: Latent Fisher divergence and Wasserstein metric. Our proposed Infusion method learns target concepts without being constrained by limited training modalities while preserving non-customized knowledge. This approach requires only 11KB of trained parameters and outperforms state-of-the-art methods in single and multi-concept customized generation.
Low GrooveSquid.com (original content) Low Difficulty Summary
Imagine you want a picture that looks like what someone is describing to you. But, current computer programs struggle to make these pictures because they get too good at one specific type of picture and forget how to make other types. Our solution, called Infusion, helps the program learn about different types of pictures without getting stuck on just one. This makes our approach better than others at making customized pictures that match what someone is describing.

Keywords

» Artificial intelligence  » Overfitting