Loading Now

Summary of Beyond Fvd: Enhanced Evaluation Metrics For Video Generation Quality, by Ge Ya Luo et al.


Beyond FVD: Enhanced Evaluation Metrics for Video Generation Quality

by Ge Ya Luo, Gian Mario Favero, Zhi Hao Luo, Alexia Jolicoeur-Martineau, Christopher Pal

First submitted to arxiv on: 7 Oct 2024

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
A recently adopted metric for evaluating video generation quality, Fréchet Video Distance (FVD), relies on several critical assumptions. Researchers have identified three significant limitations: non-Gaussianity of I3D feature space, insensitivity to temporal distortions, and impractical sample sizes required for reliable estimation. These findings undermine FVD’s reliability and suggest it falls short as a standalone metric. To address this, the authors propose JEDi, based on features derived from a Joint Embedding Predictive Architecture, measured using Maximum Mean Discrepancy with polynomial kernel. Experiments on multiple open-source datasets show that JEDi is a superior alternative to FVD, requiring fewer samples and increasing alignment with human evaluation by 34%.
Low GrooveSquid.com (original content) Low Difficulty Summary
Video generation quality has a new metric in town! The Fréchet Video Distance (FVD) was thought to be the way to go, but researchers discovered it has some major flaws. They found that I3D features don’t behave as expected, and FVD can’t handle small changes over time. To make matters worse, you need a huge amount of data to get reliable results! So, what’s the solution? The authors propose JEDi, which uses a different approach to measure video similarity. And guess what? It works much better than FVD!

Keywords

» Artificial intelligence  » Alignment  » Embedding