Loading Now

Summary of Scbench: a Sports Commentary Benchmark For Video Llms, by Kuangzhi Ge et al.


SCBench: A Sports Commentary Benchmark for Video LLMs

by Kuangzhi Ge, Lingjun Chen, Kevin Zhang, Yulin Luo, Tianyu Shi, Liaoyuan Fan, Xiang Li, Guanqun Wang, Shanghang Zhang

First submitted to arxiv on: 23 Dec 2024

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
Medium Difficulty summary: Recently, advancements have been made in Video Large Language Models (Video LLMs), but evaluating and benchmarking their performance remains limited. Current benchmarks use simple videos where the model can understand the entire video by processing a few frames. Datasets lack diversity in task format, comprising only QA or multi-choice QA, which overlooks models’ capacity for generating precise texts. Sports videos present a critical challenge, making sports commentary an ideal benchmarking task. This paper proposes a novel task: sports video commentary generation, developed SCBench for Video LLMs. The authors introduce SCORES, a six-dimensional metric designed specifically for this task, and CommentarySet, a dataset consisting of 5,775 annotated video clips and ground-truth labels. The authors conduct comprehensive evaluations on multiple Video LLMs (e.g., VILA, Video-LLaVA) and chain-of-thought baseline methods. Results show that InternVL-Chat-2 achieves the best performance with 5.44, surpassing the second-best by 1.04. This work provides a fresh perspective for future research, aiming to enhance models’ overall capabilities in complex visual understanding tasks.
Low GrooveSquid.com (original content) Low Difficulty Summary
Low Difficulty summary: Recently, big advancements have been made in artificial intelligence that can understand videos. But we don’t know how well these AI systems really perform. Currently, we use simple videos to test them, but this doesn’t show their full potential. To fix this, we’re proposing a new task: generating commentary for sports videos. We’ve created a special metric and dataset to help us evaluate these AI systems better. We tested several AI models and found that one called InternVL-Chat-2 did the best job, with a score of 5.44. This work will help us improve our AI systems so they can understand complex visual tasks better.

Keywords

» Artificial intelligence