Loading Now

Summary of Investigating Video Reasoning Capability Of Large Language Models with Tropes in Movies, by Hung-ting Su et al.


Investigating Video Reasoning Capability of Large Language Models with Tropes in Movies

by Hung-Ting Su, Chun-Tong Chao, Ya-Ching Hsu, Xudong Lin, Yulei Niu, Hung-Yi Lee, Winston H. Hsu

First submitted to arxiv on: 16 Jun 2024

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Computation and Language (cs.CL); Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This paper introduces a novel dataset, Tropes in Movies (TiM), designed to test the video reasoning capabilities of Large Language Models (LLMs). The authors identify two critical skills: Abstract Perception and Long-range Compositional Reasoning. They demonstrate that current methods marginally outperform a random baseline on these challenges and propose new approaches, Face-Enhanced Viper of Role Interactions (FEVoRI) and Context Query Reduction (ConQueR), which improve performance by 15 F1 points. The authors also introduce a protocol to evaluate the necessity of these skills for task resolution. The dataset and code are available at the provided link.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper is about creating a new way to test how well computers can understand movies. It’s like trying to figure out what’s happening in a movie by looking at individual frames, but on a much bigger scale. The authors want to see if current computer models can do this as well as humans can. They found that the current models don’t do very well and propose some new ideas to help them understand movies better. The dataset they created is available online.

Keywords

» Artificial intelligence