Summary of Open-vocabulary Object Detection Via Language Hierarchy, by Jiaxing Huang et al.
Open-Vocabulary Object Detection via Language Hierarchy
by Jiaxing Huang, Jingyi Zhang, Kai Jiang, Shijian Lu
First submitted to arxiv on: 27 Oct 2024
Categories
- Main: Computer Vision and Pattern Recognition (cs.CV)
- Secondary: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The paper presents a new approach to weakly-supervised object detection, addressing the issue of image-to-box label mismatch in existing methods. The authors introduce Language Hierarchical Self-training (LHST), which incorporates language hierarchy into detector training to learn more generalizable detectors. LHST expands image-level labels with a hierarchical structure and enables co-regularization between expanded labels and self-training. This approach provides richer supervision, mitigates the image-to-box label mismatch, and selects reliable labels based on predicted reliability. The authors also design a prompt generation method that introduces language hierarchy to bridge vocabulary gaps between training and testing. Experimental results show that LHST achieves superior generalization performance across 14 object detection datasets. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary The paper is about finding a better way to teach machines to detect objects in pictures, even when the training data isn’t perfect. Right now, most methods use weak supervision, which means they rely on broad labels like “dog” or “car”, rather than precise information like “the dog’s tail”. This can lead to problems where the machine is good at detecting certain types of objects but not others. The authors propose a new approach called Language Hierarchical Self-training (LHST) that addresses this issue by providing more accurate labels and selecting the most reliable ones. They also develop a way to generate prompts that help machines learn from different types of data. Overall, the paper shows that LHST can improve object detection performance across many datasets. |
Keywords
» Artificial intelligence » Generalization » Object detection » Prompt » Regularization » Self training » Supervised