Summary of Multimodal Unsupervised Domain Generalization by Retrieving Across the Modality Gap, By Christopher Liao et al.
Multimodal Unsupervised Domain Generalization by Retrieving Across the Modality Gap
by Christopher Liao, Christian So, Theodoros Tsiligkaridis, Brian Kulis
First submitted to arxiv on: 6 Feb 2024
Categories
- Main: Computer Vision and Pattern Recognition (cs.CV)
- Secondary: Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The proposed framework for multimodal unsupervised domain generalization (MUDG) tackles the challenge of learning a model that generalizes to unseen test domains without relying on abundant source data in the target label space. The approach uses a large task-agnostic unlabeled source dataset during fine-tuning, and does not assume any relationship between the source dataset and target task. Instead, it relies on the premise that the source dataset can be accurately searched in a joint vision-language space. The framework makes three contributions: paired k-means, an adaptive text augmentation scheme for target labels, and two simple but effective components to improve downstream target accuracy. The method is compared against state-of-the-art name-only transfer, source-free DG, and zero-shot (ZS) methods on 20 diverse datasets, showing consistent improvement in accuracy. The MUDG framework has the potential to address a significant limitation of existing domain generalization methods, which often require access to abundant source data in the target label space. By leveraging a large task-agnostic unlabeled source dataset, the proposed approach can be applied to a wider range of real-world applications where acquiring the same label space as the target task is prohibitively expensive. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary The MUDG framework uses a large task-agnostic unlabeled source dataset during fine-tuning, without assuming any relationship between the source dataset and target task. The approach improves upon existing domain generalization methods by leveraging a joint vision-language space to search for relevant information in the source dataset. This makes it possible to learn a model that can generalize to unseen test domains, even when there is limited or no labeled data available. The proposed framework consists of three main components: paired k-means, adaptive text augmentation, and two simple but effective components to improve downstream target accuracy. These components work together to improve the zero-shot accuracy of the model on a variety of tasks. The MUDG framework has the potential to make significant contributions to the field of domain generalization, particularly in applications where labeled data is scarce or expensive. |
Keywords
* Artificial intelligence * Domain generalization * Fine tuning * K means * Unsupervised * Zero shot