Summary of Multimodal Generalized Category Discovery, by Yuchang Su et al.
Multimodal Generalized Category Discovery
by Yuchang Su, Renping Zhou, Siyu Huang, Xingjian Li, Tianyang Wang, Ziyue Wang, Min Xu
First submitted to arxiv on: 18 Sep 2024
Categories
- Main: Computer Vision and Pattern Recognition (cs.CV)
- Secondary: Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The Generalized Category Discovery (GCD) task aims to classify inputs into both known and novel categories, a crucial aspect of open-world scientific discoveries. Current GCD methods are limited to unimodal data, neglecting the inherently multimodal nature of most real-world data. This work extends GCD to a multimodal setting, where inputs from different modalities provide richer and complementary information. The key challenge lies in effectively aligning heterogeneous information across modalities. A novel framework called MM-GCD addresses this by using contrastive learning and distillation techniques to align both the feature and output spaces of different modalities. MM-GCD achieves new state-of-the-art performance on the UPMC-Food101 and N24News datasets, surpassing previous methods by 11.5% and 4.7%, respectively. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary GCD is a way for computers to learn about new categories that aren’t already known. This helps scientists make new discoveries. Right now, these GCD methods can only work with data from one source, like pictures or sounds. But real-world data often has many different types of information. This paper shows how to make GCD work with multiple types of data at once. It uses special techniques to match the different pieces of information together. The new method does much better than previous methods on two big datasets. |
Keywords
» Artificial intelligence » Distillation