Loading Now

Summary of Videoscore: Building Automatic Metrics to Simulate Fine-grained Human Feedback For Video Generation, by Xuan He et al.


VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation

by Xuan He, Dongfu Jiang, Ge Zhang, Max Ku, Achint Soni, Sherman Siu, Haonan Chen, Abhranil Chandra, Ziyan Jiang, Aaran Arulraj, Kai Wang, Quy Duc Do, Yuansheng Ni, Bohan Lyu, Yaswanth Narsupalli, Rongqi Fan, Zhiheng Lyu, Yuchen Lin, Wenhu Chen

First submitted to arxiv on: 21 Jun 2024

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper presents a significant advancement in the development of automatic video metrics, addressing the long-standing issue of unreliable scores for generated videos. The main hurdle is the lack of large-scale human-annotated datasets. To overcome this challenge, the authors release VideoFeedback, a massive dataset containing multi-aspect scores from 37.6K synthesized videos created by 11 existing video generative models. Based on this dataset, they train VideoScore (initialized from Mantis) to enable automatic video quality assessment. The proposed metric outperforms previous methods, achieving a Spearman correlation of 77.1 with human ratings on the VideoFeedback-test set. Additionally, experiments on held-out datasets EvalCrafter, GenAI-Bench, and VBench demonstrate that VideoScore consistently correlates better with human judges than other metrics. As such, the authors propose VideoScore as a reliable proxy for human raters to evaluate video models and simulate fine-grained feedback in Reinforcement Learning with Human Feedback (RLHF) to improve current video generation models.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper helps us understand how computers can be taught to evaluate videos. Right now, there’s no good way to measure how well a computer program creates videos. The problem is that we don’t have enough data labeled by humans. To fix this, the authors created a massive dataset called VideoFeedback containing ratings for 37,600 synthesized videos from 11 different video generation models. They used this data to train an algorithm called VideoScore, which can automatically evaluate video quality. This new metric performs better than existing methods, scoring highly with human raters on several tests. The authors think that VideoScore could be a useful tool for evaluating and improving computer-generated videos.

Keywords

» Artificial intelligence  » Reinforcement learning  » Rlhf