Loading Now

Summary of Leveraging Foundation Models For Multi-modal Federated Learning with Incomplete Modality, by Liwei Che et al.


Leveraging Foundation Models for Multi-modal Federated Learning with Incomplete Modality

by Liwei Che, Jiaqi Wang, Xinyue Liu, Fenglong Ma

First submitted to arxiv on: 16 Jun 2024

Categories

  • Main: Machine Learning (cs.LG)
  • Secondary: Distributed, Parallel, and Cluster Computing (cs.DC)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The paper proposes a novel multi-modal federated learning (MFL) method, Federated Multi-modal contrastiVe training with Pre-trained completion (FedMVP), which addresses the challenge of modality missing in MFL. The proposed approach integrates large-scale pre-trained models to enhance federated training and enables efficient local training. Each client uses a pre-trained model for modality completion and representation knowledge transfer, while the server-side aggregates client models based on their importance using generated data and graph perspective. The method achieves superior performance on two real-world image-text classification datasets and is robust against missing modality.
Low GrooveSquid.com (original content) Low Difficulty Summary
Federated learning helps computers learn together without sharing sensitive data. This paper solves a new problem where some devices might not have all the information they need to train a model. To solve this, the authors create a new way to do multi-modal federated learning that uses pre-trained models to fill in missing information. The approach works by having each device use its own pre-trained model to fix missing data and share the results with other devices. On the server side, the combined data is used to figure out how well each device’s model is doing. This method performs better than previous methods on two real-world image-text classification tasks and can handle missing data well.

Keywords

» Artificial intelligence  » Federated learning  » Multi modal  » Text classification