Summary of Dr-llava: Visual Instruction Tuning with Symbolic Clinical Grounding, by Shenghuan Sun et al.
Dr-LLaVA: Visual Instruction Tuning with Symbolic Clinical Grounding
by Shenghuan Sun, Alexander Schubert, Gregory M. Goldgof, Zhiqing Sun, Thomas Hartvigsen, Atul J. Butte, Ahmed Alaa
First submitted to arxiv on: 29 May 2024
Categories
- Main: Artificial Intelligence (cs.AI)
- Secondary: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary A novel approach to improving the performance of Vision-Language Models (VLMs) is proposed in this paper. The primary challenge facing VLMs is their tendency to generate “hallucinogenic” textual outputs not grounded in contextual multimodal information, which is particularly problematic in medical domains where consistency with clinical reasoning and diagnostic pathways throughout multi-turn conversations is crucial. To address this issue, a new alignment algorithm is developed that leverages symbolic representations of clinical reasoning to ground VLMs in medical knowledge. The proposed algorithm allows for the generation of GPT-4-guided visual instruction tuning data at scale, simulating clinician-VLM conversations with demonstrations of clinical reasoning, and creates an automatic reward function that evaluates the clinical validity of VLM generations throughout clinician-VLM interactions. This approach eliminates the need for human involvement in training data generation or reward model construction, reducing costs compared to standard reinforcement learning with human feedback (RLHF). The proposed algorithm is applied to develop Dr-LLaVA, a conversational VLM finetuned for analyzing bone marrow pathology slides, demonstrating strong performance in multi-turn medical conversations. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary A new approach helps computers understand images and communicate with doctors. Right now, these computer models can be tricky because they sometimes make up things that aren’t true about what they’re seeing. This is a big problem when we need them to work well with doctors who are trying to figure out what’s wrong with patients. To fix this issue, researchers created a new way to teach the computers using symbolic representations of how doctors think and reason. This allows the computers to learn from examples that demonstrate how doctors would analyze images and diagnose problems. The new approach also creates a way for the computers to get feedback on their answers without needing humans to correct them all the time. One example of this new approach is a computer model called Dr-LLaVA, which can look at pictures of bone marrow slides and give helpful information to doctors. |
Keywords
» Artificial intelligence » Alignment » Gpt » Instruction tuning » Reinforcement learning » Rlhf