Loading Now

Summary of Aligning Diffusion Models with Noise-conditioned Perception, by Alexander Gambashidze et al.


Aligning Diffusion Models with Noise-Conditioned Perception

by Alexander Gambashidze, Anton Kulikov, Yuriy Sosnin, Ilya Makarov

First submitted to arxiv on: 25 Jun 2024

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
A novel approach to human preference optimization for text-to-image diffusion models is proposed, aiming to enhance prompt alignment, visual appeal, and user preference. Recent advancements in language model optimization are applied to diffusion models, which typically optimize in pixel or VAE space, leading to slower training during the preference alignment stage. The authors fine-tune Stable Diffusion 1.5 and XL using Direct Preference Optimization (DPO), Contrastive Preference Optimization (CPO), and supervised fine-tuning (SFT) within a perceptual objective U-Net embedding space. This method significantly outperforms standard latent-space implementations across various metrics, including quality and computational cost.
Low GrooveSquid.com (original content) Low Difficulty Summary
Recent advancements in human preference optimization have shown promise for text-to-image diffusion models, enhancing prompt alignment, visual appeal, and user preference. A new approach is proposed to improve the efficiency and quality of human preference alignment for diffusion models. The method involves fine-tuning popular diffusion models using different optimization techniques within a specific embedding space. This improves the performance of the model in terms of quality and computational cost.

Keywords

» Artificial intelligence  » Alignment  » Diffusion  » Embedding space  » Fine tuning  » Language model  » Latent space  » Optimization  » Prompt  » Supervised