Summary of Lightweight Unsupervised Federated Learning with Pretrained Vision Language Model, by Hao Yan et al.
Lightweight Unsupervised Federated Learning with Pretrained Vision Language Model
by Hao Yan, Yuhong Guo
First submitted to arxiv on: 17 Apr 2024
Categories
- Main: Artificial Intelligence (cs.AI)
- Secondary: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper proposes a novel lightweight unsupervised federated learning approach that leverages unlabeled data on each client to perform efficient model training and communication. The method utilizes pre-trained vision-language models, such as CLIP, to refine pseudo-labels of unlabeled instances through linear classifier training. To address data heterogeneity within each client, the paper also proposes a class-balanced text feature sampling strategy for generating synthetic instances in the feature space. Experimental results demonstrate that this approach greatly enhances model performance compared to CLIP’s zero-shot predictions and even outperforms supervised federated learning benchmark methods given limited computational and communication overhead. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper solves a big problem called “isolated data islands” where devices can’t share information because they have too much private data. Usually, this problem is solved by having each device label their own data, but that takes time and resources. Instead, the paper proposes a new way to train models on these devices using unlabeled data. It uses special pre-trained models called CLIP to make predictions and then refine those predictions using just a little extra computation. The results show that this approach is much better than just using the pre-trained model alone or traditional methods. |
Keywords
» Artificial intelligence » Federated learning » Supervised » Unsupervised » Zero shot