Summary of Investigating Video Reasoning Capability Of Large Language Models with Tropes in Movies, by Hung-ting Su et al.

Investigating Video Reasoning Capability of Large Language Models with Tropes in Movies

by Hung-Ting Su, Chun-Tong Chao, Ya-Ching Hsu, Xudong Lin, Yulei Niu, Hung-Yi Lee, Winston H. Hsu

First submitted to arxiv on: 16 Jun 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary This paper introduces a novel dataset, Tropes in Movies (TiM), designed to test the video reasoning capabilities of Large Language Models (LLMs). The authors identify two critical skills: Abstract Perception and Long-range Compositional Reasoning. They demonstrate that current methods marginally outperform a random baseline on these challenges and propose new approaches, Face-Enhanced Viper of Role Interactions (FEVoRI) and Context Query Reduction (ConQueR), which improve performance by 15 F1 points. The authors also introduce a protocol to evaluate the necessity of these skills for task resolution. The dataset and code are available at the provided link.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This paper is about creating a new way to test how well computers can understand movies. It’s like trying to figure out what’s happening in a movie by looking at individual frames, but on a much bigger scale. The authors want to see if current computer models can do this as well as humans can. They found that the current models don’t do very well and propose some new ideas to help them understand movies better. The dataset they created is available online.

Keywords

» Artificial intelligence

Investigating Video Reasoning Capability of Large Language Models with Tropes in Movies

by Hung-Ting Su, Chun-Tong Chao, Ya-Ching Hsu, Xudong Lin, Yulei Niu, Hung-Yi Lee, Winston H. Hsu

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of First-order Manifold Data Augmentation For Regression Learning, by Ilya Kaufman and Omri Azencot

Summary of Leveraging Foundation Models For Multi-modal Federated Learning with Incomplete Modality, by Liwei Che et al.

Related Posts