Summary of Slowfast-vgen: Slow-fast Learning For Action-driven Long Video Generation, by Yining Hong et al.
SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation
by Yining Hong, Beide Liu, Maxine Wu, Yuanhao Zhai, Kai-Wei Chang, Linjie Li, Kevin Lin, Chung-Ching Lin, Jianfeng Wang, Zhengyuan Yang, Yingnian Wu, Lijuan Wang
First submitted to arxiv on: 30 Oct 2024
Categories
- Main: Computer Vision and Pattern Recognition (cs.CV)
- Secondary: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Robotics (cs.RO)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The novel SlowFast-VGen model is introduced, which combines slow learning of general world dynamics with fast storage of episodic memory from new experiences. The model addresses the inconsistencies in longer video generation by incorporating a masked conditional video diffusion model and an inference-time fast learning strategy based on temporal LoRA modules. The approach enables the recall of prior multi-episode experiences for context-aware skill learning, improving action-driven video generation and long-horizon planning tasks. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary SlowFast-VGen is a new way to make videos that looks at both big pictures and small details. It helps computers learn how to do things by combining two kinds of learning: slow learning about the world and fast storing memories from new experiences. This makes the generated videos more consistent and accurate, especially for longer videos. |
Keywords
» Artificial intelligence » Diffusion model » Inference » Lora » Recall