Summary of Beyond Fvd: Enhanced Evaluation Metrics For Video Generation Quality, by Ge Ya Luo et al.

Beyond FVD: Enhanced Evaluation Metrics for Video Generation Quality

by Ge Ya Luo, Gian Mario Favero, Zhi Hao Luo, Alexia Jolicoeur-Martineau, Christopher Pal

First submitted to arxiv on: 7 Oct 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary A recently adopted metric for evaluating video generation quality, Fréchet Video Distance (FVD), relies on several critical assumptions. Researchers have identified three significant limitations: non-Gaussianity of I3D feature space, insensitivity to temporal distortions, and impractical sample sizes required for reliable estimation. These findings undermine FVD’s reliability and suggest it falls short as a standalone metric. To address this, the authors propose JEDi, based on features derived from a Joint Embedding Predictive Architecture, measured using Maximum Mean Discrepancy with polynomial kernel. Experiments on multiple open-source datasets show that JEDi is a superior alternative to FVD, requiring fewer samples and increasing alignment with human evaluation by 34%.
Low	GrooveSquid.com (original content)	Low Difficulty Summary Video generation quality has a new metric in town! The Fréchet Video Distance (FVD) was thought to be the way to go, but researchers discovered it has some major flaws. They found that I3D features don’t behave as expected, and FVD can’t handle small changes over time. To make matters worse, you need a huge amount of data to get reliable results! So, what’s the solution? The authors propose JEDi, which uses a different approach to measure video similarity. And guess what? It works much better than FVD!

Keywords

» Artificial intelligence » Alignment » Embedding

Beyond FVD: Enhanced Evaluation Metrics for Video Generation Quality

by Ge Ya Luo, Gian Mario Favero, Zhi Hao Luo, Alexia Jolicoeur-Martineau, Christopher Pal

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Hero: Human-feedback Efficient Reinforcement Learning For Online Diffusion Model Finetuning, by Ayano Hiranaka et al.

Summary of Prefixquant: Eliminating Outliers by Prefixed Tokens For Large Language Models Quantization, By Mengzhao Chen et al.

Related Posts