Loading Now

Summary of Open-vocabulary Object Detection Via Language Hierarchy, by Jiaxing Huang et al.


Open-Vocabulary Object Detection via Language Hierarchy

by Jiaxing Huang, Jingyi Zhang, Kai Jiang, Shijian Lu

First submitted to arxiv on: 27 Oct 2024

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The paper presents a new approach to weakly-supervised object detection, addressing the issue of image-to-box label mismatch in existing methods. The authors introduce Language Hierarchical Self-training (LHST), which incorporates language hierarchy into detector training to learn more generalizable detectors. LHST expands image-level labels with a hierarchical structure and enables co-regularization between expanded labels and self-training. This approach provides richer supervision, mitigates the image-to-box label mismatch, and selects reliable labels based on predicted reliability. The authors also design a prompt generation method that introduces language hierarchy to bridge vocabulary gaps between training and testing. Experimental results show that LHST achieves superior generalization performance across 14 object detection datasets.
Low GrooveSquid.com (original content) Low Difficulty Summary
The paper is about finding a better way to teach machines to detect objects in pictures, even when the training data isn’t perfect. Right now, most methods use weak supervision, which means they rely on broad labels like “dog” or “car”, rather than precise information like “the dog’s tail”. This can lead to problems where the machine is good at detecting certain types of objects but not others. The authors propose a new approach called Language Hierarchical Self-training (LHST) that addresses this issue by providing more accurate labels and selecting the most reliable ones. They also develop a way to generate prompts that help machines learn from different types of data. Overall, the paper shows that LHST can improve object detection performance across many datasets.

Keywords

» Artificial intelligence  » Generalization  » Object detection  » Prompt  » Regularization  » Self training  » Supervised