Loading Now

Summary of Miss: a Generative Pretraining and Finetuning Approach For Med-vqa, by Jiawei Chen et al.


MISS: A Generative Pretraining and Finetuning Approach for Med-VQA

by Jiawei Chen, Dingkang Yang, Yue Jiang, Yuxuan Lei, Lihua Zhang

First submitted to arxiv on: 10 Jan 2024

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Artificial Intelligence (cs.AI)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
Medical visual question answering (VQA) is a challenging task that requires effective generalization performance from Vision-Language Pre-training (VLP) models. Existing methods treat VQA as an answer classification task, but this approach has limited practical applications due to the lack of large-scale medical image-text pairs datasets for pretraining. To address this challenge, we propose a MultI-task Self-Supervised learning based framework (MISS) that unifies text and multimodal encoders through multi-task learning and aligns image-text features. Our method also introduces a Transfer-and-Caption approach that extends the feature space of single-modal image datasets using Large Language Models (LLMs), enabling traditional medical vision field task data to be applied to VLP. This results in excellent performance with fewer multimodal datasets, highlighting the advantages of generative VQA models.
Low GrooveSquid.com (original content) Low Difficulty Summary
Medical VQA is a tough problem where computers try to understand what’s happening in medical images and answer questions about them. Right now, most methods treat this task as just answering a simple question, but that doesn’t work well in real-life situations. To make things better, we need more data for training these computer models, but collecting and labeling that data is really hard and expensive. Our solution is to create a new way of learning called MISS that combines different tasks together to help computers understand both text and images at the same time. This also lets us use old medical image datasets in new ways, which makes our approach really powerful.

Keywords

» Artificial intelligence  » Classification  » Generalization  » Multi task  » Pretraining  » Question answering  » Self supervised