Loading Now

Summary of Generative Lifting Of Multiview to 3d From Unknown Pose: Wrapping Nerf Inside Diffusion, by Xin Yuan et al.


Generative Lifting of Multiview to 3D from Unknown Pose: Wrapping NeRF inside Diffusion

by Xin Yuan, Rana Hanocka, Michael Maire

First submitted to arxiv on: 11 Jun 2024

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
Our paper presents a generative modeling approach to multiview reconstruction from unknown pose. We simultaneously learn a network that predicts camera pose from 2D image input, as well as the parameters of a Neural Radiance Field (NeRF) for the 3D scene. To train our system, we wrap both networks inside a Denoising Diffusion Probabilistic Model (DDPM) and optimize the denoising objective. Our framework requires the system to predict camera pose and render the NeRF from that pose, which forces it to learn the underlying 3D representation and mapping from images to camera extrinsic parameters. We design a custom network architecture for pose as a distribution, allowing our system to discover view correspondences when trained end-to-end for denoising alone. Our approach successfully builds NeRFs without pose knowledge for challenging scenes where competing methods fail. The learned NeRF can be extracted and used as a 3D scene model, and our full system can sample novel camera poses and generate novel-view images.
Low GrooveSquid.com (original content) Low Difficulty Summary
Our paper uses a special kind of computer program to help cameras figure out how they’re positioned in space. This is useful because it allows us to create detailed models of the world around us. We developed a new way for computers to learn about 3D spaces by looking at many 2D pictures taken from different angles. Our method uses a special type of computer model called a Neural Radiance Field (NeRF) to create these 3D models. This helps us solve a problem where cameras can’t always figure out how they’re positioned, which makes it hard for computers to understand the world.

Keywords

» Artificial intelligence  » Diffusion  » Probabilistic model