Loading Now

Summary of Prompting Medical Large Vision-language Models to Diagnose Pathologies by Visual Question Answering, By Danfeng Guo and Demetri Terzopoulos


Prompting Medical Large Vision-Language Models to Diagnose Pathologies by Visual Question Answering

by Danfeng Guo, Demetri Terzopoulos

First submitted to arxiv on: 31 Jul 2024

Categories

  • Main: Computer Vision and Pattern Recognition (cs.CV)
  • Secondary: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
The abstract proposes two prompting strategies for Large Vision-Language Models (LVLMs) that extend to the medical domain, aiming to improve diagnostic performance on Visual Question Answering (VQA) tasks while reducing hallucination. The strategies provide detailed explanations of queried pathologies or fine-tune weak learners and textually provide their judgments to LVLMs. Tested on MIMIC-CXR-JPG and Chexpert datasets, the methods significantly improve the diagnostic F1 score, with a highest increase of 0.27.
Low GrooveSquid.com (original content) Low Difficulty Summary
Medical Vision-Language Models (LVLMs) are good at diagnosing simple medical issues, but they struggle with complex ones and minority cases due to limited training data. Two new strategies can help: giving MLVLMs detailed information about what doctors are looking for or teaching a simple learner and having it tell the LVLM what to do. These methods work better on both MIMIC-CXR-JPG and Chexpert datasets, and even help other language models.

Keywords

» Artificial intelligence  » F1 score  » Hallucination  » Prompting  » Question answering