Loading Now

Summary of Frankenstein: Generating Semantic-compositional 3d Scenes in One Tri-plane, by Han Yan et al.


Frankenstein: Generating Semantic-Compositional 3D Scenes in One Tri-Plane

by Han Yan, Yang Li, Zhennan Wu, Shenzhou Chen, Weixuan Sun, Taizhang Shang, Weizhe Liu, Tian Chen, Xiaqiang Dai, Chao Ma, Hongdong Li, Pan Ji

First submitted to arxiv on: 24 Mar 2024

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Artificial Intelligence (cs.AI); Graphics (cs.GR)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The Frankenstein framework is a novel diffusion-based method that generates semantic-compositional 3D scenes in a single pass. Unlike existing methods, Frankenstein outputs multiple separated shapes, each corresponding to a semantically meaningful part. The framework encodes 3D scene information in a single tri-plane tensor, from which multiple SDF fields can be decoded to represent compositional shapes. During training, an auto-encoder compresses tri-planes into a latent space, and then the denoising diffusion process is employed to approximate the distribution of compositional scenes. Frankenstein demonstrates promising results in generating room interiors as well as human avatars with automatically separated parts. The generated scenes facilitate many downstream applications, such as part-wise re-texturing, object rearrangement in the room or avatar cloth re-targeting. The model’s performance is evaluated using a combination of metrics, including 3D IoU, chamfer distance, and PSNR. The results show that Frankenstein outperforms existing methods in generating realistic and accurate 3D scenes.
Low GrooveSquid.com (original content) Low Difficulty Summary
Frankenstein is a new way to create 3D scenes that are made up of different parts, like objects or people. It does this by using a type of machine learning called diffusion-based modeling. This allows it to generate multiple shapes at once, each one corresponding to a specific part of the scene. The researchers tested Frankenstein on two types of scenes: rooms and human avatars. They found that it was good at generating realistic and accurate 3D scenes for both types. This could be useful for many applications, such as re-texturing objects or rearranging objects in a room. Overall, Frankenstein is an exciting new technology that has the potential to make creating 3D scenes easier and more accurate.

Keywords

» Artificial intelligence  » Diffusion  » Encoder  » Latent space  » Machine learning