Loading Now

Summary of Taming Mode Collapse in Score Distillation For Text-to-3d Generation, by Peihao Wang et al.


Taming Mode Collapse in Score Distillation for Text-to-3D Generation

by Peihao Wang, Dejia Xu, Zhiwen Fan, Dilin Wang, Sreyas Mohan, Forrest Iandola, Rakesh Ranjan, Yilei Li, Qiang Liu, Zhangyang Wang, Vikas Chandra

First submitted to arxiv on: 31 Dec 2023

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper proposes a new approach to text-to-3D generation using score distillation, which suffers from view inconsistency issues known as the “Janus” artifact. The authors reveal that existing methods degenerate into maximal likelihood seeking on each view independently, leading to mode collapse and Janus artifacts. To address this issue, they introduce Entropic Score Distillation (ESD), a new objective function that encourages diversity among different views by maximizing entropy. ESD can be implemented using the classifier-free guidance trick upon variational score distillation. The authors demonstrate the effectiveness of ESD in mitigating Janus artifacts through extensive experiments.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper helps solve a problem with making 3D pictures from text. Right now, this process is not perfect and often shows the same front face on different views. Researchers have tried to fix this issue by changing how they create scores for the generated images. However, they haven’t understood why it works or what’s going wrong. This paper figures out that the problem is caused by the way current methods try to make each view look perfect. To solve this, the authors develop a new method called Entropic Score Distillation (ESD). ESD helps create more diverse and realistic 3D images.

Keywords

* Artificial intelligence  * Distillation  * Likelihood  * Objective function