Summary of Examining Modality Incongruity in Multimodal Federated Learning For Medical Vision and Language-based Disease Detection, by Pramit Saha et al.
Examining Modality Incongruity in Multimodal Federated Learning for Medical Vision and Language-based Disease Detection
by Pramit Saha, Divyanshu Mishra, Felix Wagner, Konstantinos Kamnitsas, J. Alison Noble
First submitted to arxiv on: 7 Feb 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary Multimodal Federated Learning (MMFL) leverages multiple modalities in each client to create a more robust FL model than its unimodal counterpart. However, the impact of missing modality across clients, known as modality incongruity, has been largely overlooked. This paper investigates the effects of modality incongruity on data heterogeneity and reveals its connection with MMFL. The study examines whether incongruent MMFL with unimodal and multimodal clients is more beneficial than unimodal FL. To address this issue, three potential routes are explored: self-attention mechanisms for information fusion in MMFL, a modality imputation network (MIN) pre-trained in a multimodal client for modality translation in unimodal clients, and regularization techniques at the client-level and server-level. Experiments are conducted on two publicly available datasets, MIMIC-CXR and Open-I, with Chest X-Ray and radiology reports. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper is about a new way to improve how machines learn from lots of different sources. It’s called Multimodal Federated Learning (MMFL). MMFL uses information from many different types of data to make better predictions. But what happens when some of this data is missing? This paper figures out the impact of missing data and shows that it can affect how well MMFL works. The study looks at three ways to solve this problem: using special attention mechanisms, creating a fake version of the missing data, and using special techniques to make sure all the data is used equally. |
Keywords
* Artificial intelligence * Attention * Federated learning * Regularization * Self attention * Translation