Loading Now

Summary of Exploring Scalability Of Self-training For Open-vocabulary Temporal Action Localization, by Jeongseok Hyun et al.


Exploring Scalability of Self-Training for Open-Vocabulary Temporal Action Localization

by Jeongseok Hyun, Su Ho Han, Hyolim Kang, Joon-Young Lee, Seon Joo Kim

First submitted to arxiv on: 9 Jul 2024

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The paper explores the scalability of self-training with unlabeled YouTube videos for open-vocabulary temporal action localization (TAL). Existing OV-TAL methods rely on human-labeled datasets, which are limited in size and generalizability. The approach involves training a class-agnostic action localizer on a human-labeled dataset to generate pseudo-labels for unlabeled videos, then using the large-scale pseudo-labeled dataset to train the localizer. The method enhances the generalizability of an action localizer and identifies limitations in existing OV-TAL evaluation schemes, proposing a new benchmark. The paper also showcases the TAL performance of the Gemini-1.5 model on the new benchmark.
Low GrooveSquid.com (original content) Low Difficulty Summary
The paper helps us understand how to improve temporal action localization by using more videos from YouTube without labels. Right now, we have to use small datasets labeled by humans, which isn’t enough. The idea is to train a model on some labeled data and then use it to label many unlabeled videos. This way, we can get a much bigger dataset that helps the model learn better. The paper also talks about how we evaluate these models and proposes a new way to do it.

Keywords

» Artificial intelligence  » Gemini  » Self training