Loading Now

Summary of Open-sora Plan: Open-source Large Video Generation Model, by Bin Lin et al.


Open-Sora Plan: Open-Source Large Video Generation Model

by Bin Lin, Yunyang Ge, Xinhua Cheng, Zongjian Li, Bin Zhu, Shaodong Wang, Xianyi He, Yang Ye, Shenghai Yuan, Liuhan Chen, Tanghui Jia, Junwu Zhang, Zhenyu Tang, Yatian Pang, Bin She, Cen Yan, Zhiheng Hu, Xiaoyi Dong, Lin Chen, Zhang Pan, Xing Zhou, Shaoling Dong, Yonghong Tian, Li Yuan

First submitted to arxiv on: 28 Nov 2024

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The Open-Sora Plan is an open-source project that aims to generate high-resolution videos with long durations based on user inputs. The project consists of multiple components, including a Wavelet-Flow Variational Autoencoder, a Joint Image-Video Skiparse Denoiser, and condition controllers. To achieve efficient training and inference, the team designed various assistant strategies and proposed a multi-dimensional data curation pipeline for obtaining high-quality data. The Open-Sora Plan achieves impressive video generation results in both qualitative and quantitative evaluations, making it a valuable contribution to the video generation research community.
Low GrooveSquid.com (original content) Low Difficulty Summary
The Open-Sora Plan is a new way to make videos that are really good quality and can be any length. It’s like having a super smart movie maker! The team used some fancy computer algorithms to make it work. They also had to figure out how to train the system so it could generate videos quickly and efficiently. Now, people can use this technology to make their own amazing videos.

Keywords

» Artificial intelligence  » Inference  » Variational autoencoder