Summary of Medical Adaptation Of Large Language and Vision-language Models: Are We Making Progress?, by Daniel P. Jeong et al.
Medical Adaptation of Large Language and Vision-Language Models: Are We Making Progress?
by Daniel P. Jeong, Saurabh Garg, Zachary C. Lipton, Michael Oberst
First submitted to arxiv on: 6 Nov 2024
Categories
- Main: Computation and Language (cs.CL)
- Secondary: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary This paper investigates whether foundation models specifically designed for medical applications can outperform their base models in question-answering tasks. Recent works have adapted general-purpose large language models (LLMs) and vision-language models (VLMs) via continued pretraining on biomedical corpora, claiming improved performance on downstream medical tasks. In contrast, this study compares seven public “medical” LLMs and two VLMs to their base models, finding that most medical models fail to consistently improve over their bases in the zero-/few-shot prompting regime. For instance, across various tasks and model pairs, medical LLMs only outperform their bases in 12.1% of cases, reach a tie in 49.8%, and are worse than their bases in 38.2%. The study emphasizes the importance of directly comparing each medical model to its base, optimizing prompts separately, and accounting for statistical uncertainty. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Imagine you want to make sure that computers can answer medical questions correctly. Some people think that special computer models can do this better than regular ones. But is that really true? This study compared these “medical” models to their normal versions and found that most of them don’t actually do any better. In fact, some even did worse! The researchers made sure to compare the medical models directly with their normal versions, used special ways to help the computers understand the questions, and accounted for chance. They think this is important because many people have already developed these medical models without doing these things, which might be leading to incorrect conclusions. |
Keywords
» Artificial intelligence » Few shot » Pretraining » Prompting » Question answering