Summary of Beyond Fvd: Enhanced Evaluation Metrics For Video Generation Quality, by Ge Ya Luo et al.
Beyond FVD: Enhanced Evaluation Metrics for Video Generation Quality
by Ge Ya Luo, Gian Mario Favero, Zhi Hao Luo, Alexia Jolicoeur-Martineau, Christopher Pal
First submitted to arxiv on: 7 Oct 2024
Categories
- Main: Computer Vision and Pattern Recognition (cs.CV)
- Secondary: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary A recently adopted metric for evaluating video generation quality, Fréchet Video Distance (FVD), relies on several critical assumptions. Researchers have identified three significant limitations: non-Gaussianity of I3D feature space, insensitivity to temporal distortions, and impractical sample sizes required for reliable estimation. These findings undermine FVD’s reliability and suggest it falls short as a standalone metric. To address this, the authors propose JEDi, based on features derived from a Joint Embedding Predictive Architecture, measured using Maximum Mean Discrepancy with polynomial kernel. Experiments on multiple open-source datasets show that JEDi is a superior alternative to FVD, requiring fewer samples and increasing alignment with human evaluation by 34%. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Video generation quality has a new metric in town! The Fréchet Video Distance (FVD) was thought to be the way to go, but researchers discovered it has some major flaws. They found that I3D features don’t behave as expected, and FVD can’t handle small changes over time. To make matters worse, you need a huge amount of data to get reliable results! So, what’s the solution? The authors propose JEDi, which uses a different approach to measure video similarity. And guess what? It works much better than FVD! |
Keywords
» Artificial intelligence » Alignment » Embedding