Summary of Tc-bench: Benchmarking Temporal Compositionality in Text-to-video and Image-to-video Generation, by Weixi Feng et al.

TC-Bench: Benchmarking Temporal Compositionality in Text-to-Video and Image-to-Video Generation

by Weixi Feng, Jiachen Li, Michael Saxon, Tsu-jui Fu, Wenhu Chen, William Yang Wang

First submitted to arxiv on: 12 Jun 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary This study proposes a new benchmark, TC-Bench, to evaluate the Temporal Compositionality of video generation models. The benchmark consists of text prompts, ground truth videos, and evaluation metrics to assess the emergence of new concepts and their transitions in generated videos. Unlike existing benchmarks that focus on simple actions, TC-Bench addresses the temporal dimension by incorporating initial and final states of scenes, reducing ambiguities for frame development. The study also introduces new metrics to measure the completeness of component transitions in generated videos, which demonstrate higher correlations with human judgments than existing metrics. Experimental results reveal that most video generators struggle to interpret descriptions of compositional changes and synthesize various components across different time steps, highlighting a significant gap for future improvement.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This research paper is about creating more realistic videos using artificial intelligence. Right now, AI-generated videos are limited because they don’t understand how things change over time. The researchers propose a new way to test video generation models that takes into account the complexity of real-world videos. They created a special set of prompts and examples to help evaluate these models. The results show that current video generators aren’t very good at understanding what’s happening in different parts of a video and how they should be connected.

Keywords

» Artificial intelligence

TC-Bench: Benchmarking Temporal Compositionality in Text-to-Video and Image-to-Video Generation

by Weixi Feng, Jiachen Li, Michael Saxon, Tsu-jui Fu, Wenhu Chen, William Yang Wang

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Next-generation Database Interfaces: a Survey Of Llm-based Text-to-sql, by Zijin Hong et al.

Summary of Egoexo-fitness: Towards Egocentric and Exocentric Full-body Action Understanding, by Yuan-ming Li et al.

Related Posts