Summary of Conditional Prompt Tuning For Multimodal Fusion, by Ruixiang Jiang et al.

Conditional Prompt Tuning for Multimodal Fusion

by Ruixiang Jiang, Lingbo Liu, Changwen Chen

First submitted to arxiv on: 28 Nov 2023

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The paper presents a method for parameter-efficient multimodal fusion by leveraging the representation of one modality to guide prompting in another. The approach involves encoding one modality and using its representation as a prior to conditionally prompt all frozen layers of the other modality, achieving adaptive prompts that capture global-level and instance-level features. The mixture of prompt experts (MoPE) is introduced to dynamically route each instance to the most suitable prompt experts for encoding, and a regularization term is added to avoid degenerated prompt expert routing. The method shows improved expressiveness and scalability compared to vanilla prompting, with state-of-the-art results on three multimodal datasets, requiring only 0.7% of trainable parameters.
Low	GrooveSquid.com (original content)	Low Difficulty Summary The paper explores a new way to combine information from different senses (like images and words) without needing to train a whole new model for each combination. They show that by using the information in one sense to guide what’s being looked at or listened to in another, they can get better results with much less training data needed. This could be useful for things like image-to-text systems or machine translation.

Keywords

* Artificial intelligence * Parameter efficient * Prompt * Prompting * Regularization * Translation

Conditional Prompt Tuning for Multimodal Fusion

by Ruixiang Jiang, Lingbo Liu, Changwen Chen

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Copr: Continual Learning Human Preference Through Optimal Policy Regularization, by Han Zhang et al.

Summary of A Framework For Conditional Diffusion Modelling with Applications in Motif Scaffolding For Protein Design, by Kieran Didi et al.

Related Posts