Summary of Tc-bench: Benchmarking Temporal Compositionality in Text-to-video and Image-to-video Generation, by Weixi Feng et al.
TC-Bench: Benchmarking Temporal Compositionality in Text-to-Video and Image-to-Video Generation
by Weixi Feng, Jiachen Li, Michael Saxon, Tsu-jui Fu, Wenhu Chen, William Yang Wang
First submitted to arxiv on: 12 Jun 2024
Categories
- Main: Computer Vision and Pattern Recognition (cs.CV)
- Secondary: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This study proposes a new benchmark, TC-Bench, to evaluate the Temporal Compositionality of video generation models. The benchmark consists of text prompts, ground truth videos, and evaluation metrics to assess the emergence of new concepts and their transitions in generated videos. Unlike existing benchmarks that focus on simple actions, TC-Bench addresses the temporal dimension by incorporating initial and final states of scenes, reducing ambiguities for frame development. The study also introduces new metrics to measure the completeness of component transitions in generated videos, which demonstrate higher correlations with human judgments than existing metrics. Experimental results reveal that most video generators struggle to interpret descriptions of compositional changes and synthesize various components across different time steps, highlighting a significant gap for future improvement. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This research paper is about creating more realistic videos using artificial intelligence. Right now, AI-generated videos are limited because they don’t understand how things change over time. The researchers propose a new way to test video generation models that takes into account the complexity of real-world videos. They created a special set of prompts and examples to help evaluate these models. The results show that current video generators aren’t very good at understanding what’s happening in different parts of a video and how they should be connected. |