Summary of Factorized-dreamer: Training a High-quality Video Generator with Limited and Low-quality Data, by Tao Yang et al.

Factorized-Dreamer: Training A High-Quality Video Generator with Limited and Low-Quality Data

by Tao Yang, Yangming Shi, Yunwen Huang, Feng Chen, Yin Zheng, Lei Zhang

First submitted to arxiv on: 19 Aug 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary A new approach to text-to-video (T2V) generation is presented, which shows that publicly available limited and low-quality data are sufficient to train a high-quality video generator. The proposed method, Factorized-Dreamer, factorizes the T2V process into two steps: generating an image conditioned on a caption and synthesizing the video based on the generated image and motion details. It incorporates an adapter to combine text and image embeddings, pixel-aware cross attention modules, and PredictNet for optical flow supervision. The model can be trained directly on limited datasets with noisy captions, alleviating the need for large-scale high-quality video-text pairs.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This research focuses on creating high-quality videos from text descriptions. It’s like a magic trick where you input words and get a video that matches what you wrote! To make this happen, scientists broke down the process into two parts: first, they generate an image based on the text, and then create a video that follows the movements described in the text. They designed a special machine learning model called Factorized-Dreamer to help with this task. This model can even work with limited data and noisy descriptions, making it more accessible for people who want to try it out.

Keywords

» Artificial intelligence » Cross attention » Machine learning » Optical flow

Factorized-Dreamer: Training A High-Quality Video Generator with Limited and Low-Quality Data

by Tao Yang, Yangming Shi, Yunwen Huang, Feng Chen, Yin Zheng, Lei Zhang

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Deterministic Policy Gradient Primal-dual Methods For Continuous-space Constrained Mdps, by Sergio Rozada et al.

Summary of Ai-driven Review Systems: Evaluating Llms in Scalable and Bias-aware Academic Reviews, by Keith Tyser et al.

Related Posts