Loading Now

Summary of Queen: Quantized Efficient Encoding Of Dynamic Gaussians For Streaming Free-viewpoint Videos, by Sharath Girish et al.


QUEEN: QUantized Efficient ENcoding of Dynamic Gaussians for Streaming Free-viewpoint Videos

by Sharath Girish, Tianye Li, Amrita Mazumdar, Abhinav Shrivastava, David Luebke, Shalini De Mello

First submitted to arxiv on: 5 Dec 2024

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper proposes a novel framework for streaming free-viewpoint video (FVV) using 3D Gaussian Splatting (3D-GS). The framework, called QUantized and Efficient ENcoding (QUEEN), directly learns Gaussian attribute residuals between consecutive frames at each time-step without imposing any structural constraints on them. This allows for high-quality reconstruction and generalizability. To efficiently store the residuals, QUEEN uses a quantization-sparsity framework that includes a learned latent-decoder for effectively quantizing attribute residuals other than Gaussian positions and a learned gating module to sparsify position residuals. The paper also proposes using the Gaussian viewspace gradient difference vector as a signal to separate static and dynamic content in the scene, which serves as a guide for effective sparsity learning and speeds up training. QUEEN outperforms state-of-the-art online FVV methods on all metrics and achieves impressive performance with model sizes reduced to 0.7 MB per frame while training in under 5 seconds and rendering at 350 FPS.
Low GrooveSquid.com (original content) Low Difficulty Summary
Streaming free-viewpoint video (FVV) is a challenging problem that requires incremental updates to a volumetric representation, fast training, and rendering to satisfy real-time constraints and a small memory footprint for efficient transmission. This paper proposes a new approach called QUEEN that uses 3D Gaussian Splatting (3D-GS). The goal is to enable novel applications like 3D video conferencing and live volumetric video broadcast. QUEEN learns residuals between consecutive frames without imposing any structural constraints, which allows for high-quality reconstruction and generalizability.

Keywords

» Artificial intelligence  » Decoder  » Quantization