Loading Now

Summary of Slowfast-vgen: Slow-fast Learning For Action-driven Long Video Generation, by Yining Hong et al.


SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation

by Yining Hong, Beide Liu, Maxine Wu, Yuanhao Zhai, Kai-Wei Chang, Linjie Li, Kevin Lin, Chung-Ching Lin, Jianfeng Wang, Zhengyuan Yang, Yingnian Wu, Lijuan Wang

First submitted to arxiv on: 30 Oct 2024

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Robotics (cs.RO)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The novel SlowFast-VGen model is introduced, which combines slow learning of general world dynamics with fast storage of episodic memory from new experiences. The model addresses the inconsistencies in longer video generation by incorporating a masked conditional video diffusion model and an inference-time fast learning strategy based on temporal LoRA modules. The approach enables the recall of prior multi-episode experiences for context-aware skill learning, improving action-driven video generation and long-horizon planning tasks.
Low GrooveSquid.com (original content) Low Difficulty Summary
SlowFast-VGen is a new way to make videos that looks at both big pictures and small details. It helps computers learn how to do things by combining two kinds of learning: slow learning about the world and fast storing memories from new experiences. This makes the generated videos more consistent and accurate, especially for longer videos.

Keywords

» Artificial intelligence  » Diffusion model  » Inference  » Lora  » Recall