Loading Now

Summary of Clumo: Cluster-based Modality Fusion Prompt For Continual Learning in Visual Question Answering, by Yuliang Cai et al.


CluMo: Cluster-based Modality Fusion Prompt for Continual Learning in Visual Question Answering

by Yuliang Cai, Mohammad Rostami

First submitted to arxiv on: 21 Aug 2024

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
Large vision-language models (VLMs) have shown significant performance boosts across various application domains. However, adopting them to tackle sequential tasks has been challenging due to finetuning-induced generalization power loss and catastrophic forgetting on previously learned tasks. To address these limitations, we propose a novel prompt-based multimodal continual learning (CL) method for VLMs, called CluMo. We design a Key-Key-Prompt pair, where each prompt is linked to a visual prompt key and a textual prompt key. Our two-stage training strategy involves K-means clustering for single-modal key selection followed by VLM training with the selected prompt in the CL scenario. Experiments on two benchmarks demonstrate our method achieves state-of-the-art performance.
Low GrooveSquid.com (original content) Low Difficulty Summary
Imagine you have a super smart computer program that can do lots of tasks, like recognize pictures and understand text. But what if you want it to learn new things without forgetting what it already knows? That’s the challenge this paper solves. They create a new way for these programs to learn and remember multiple tasks at once. It works by giving the program special “prompts” that help it focus on the right task. This method is really good at learning and remembering new things, and it can even do better than other methods in certain situations.

Keywords

» Artificial intelligence  » Clustering  » Continual learning  » Generalization  » K means  » Prompt