Summary of Sure-vqa: Systematic Understanding Of Robustness Evaluation in Medical Vqa Tasks, by Kim-celine Kahl et al.
SURE-VQA: Systematic Understanding of Robustness Evaluation in Medical VQA Tasks
by Kim-Celine Kahl, Selen Erkan, Jeremias Traub, Carsten T. Lüth, Klaus Maier-Hein, Lena Maier-Hein, Paul F. Jaeger
First submitted to arxiv on: 29 Nov 2024
Categories
- Main: Computer Vision and Pattern Recognition (cs.CV)
- Secondary: Machine Learning (cs.LG)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The paper introduces a novel framework, SURE-VQA, to evaluate the robustness of Vision-Language Models (VLMs) in medical tasks like Visual Question Answering (VQA). The framework addresses three key requirements: measuring robustness on real-world shifts, using large language models for semantic evaluation, and reporting sanity baselines. The authors conduct a study on fine-tuning methods across three medical datasets with four distribution shifts, finding that LoRA is the best-performing method, and no method consistently outperforms others in terms of robustness to shifts. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary The paper creates a framework called SURE-VQA to help us better understand how well Vision-Language Models work. These models can be used as helpers for patients and doctors. To make sure these models are reliable, we need to test them on different types of data. The authors show that current ways of testing are not good enough, so they created a new way to do it. They tested several methods on medical datasets and found some interesting things: sometimes the models work really well without using images at all, LoRA is the best method for fine-tuning, and no method stands out as the best in terms of how well it handles changes. |
Keywords
» Artificial intelligence » Fine tuning » Lora » Question answering