Loading Now

Summary of Disco-diff: Enhancing Continuous Diffusion Models with Discrete Latents, by Yilun Xu et al.


DisCo-Diff: Enhancing Continuous Diffusion Models with Discrete Latents

by Yilun Xu, Gabriele Corso, Tommi Jaakkola, Arash Vahdat, Karsten Kreis

First submitted to arxiv on: 3 Jul 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
Diffusion models (DMs) have significantly impacted generative learning by encoding data into a simple Gaussian distribution through a diffusion process. However, this approach may be unnecessarily complex for capturing multimodal data distributions. To simplify this challenge, we propose Discrete-Continuous Latent Variable Diffusion Models (DisCo-Diff), which introduces complementary discrete latent variables to augment DMs. These latents are inferred using an encoder and trained end-to-end with the DM. Unlike pre-trained networks, DisCo-Diff is universally applicable without relying on prior knowledge. The proposed framework reduces the complexity of learning the DM’s noise-to-data mapping by introducing fewer discrete variables with small codebooks. We validate DisCo-Diff on various image synthesis tasks, molecular docking, and toy data, finding that it consistently outperforms baseline models in terms of FID scores.
Low GrooveSquid.com (original content) Low Difficulty Summary
This research paper is about a new way to improve generative learning using something called diffusion models. The current method for encoding data into a simple distribution can be complicated when dealing with multiple types of data. To make things easier, the researchers propose a new approach that adds extra layers of information to the data. These additional layers are like codes that help the model understand what’s important and what’s not. The new approach is simpler and more flexible than the old method and can work on different types of data, including images and molecules. The results show that this new approach performs better than the old one in certain situations.

Keywords

» Artificial intelligence  » Diffusion  » Encoder  » Image synthesis