Summary of Prompt-aware Adapter: Towards Learning Adaptive Visual Tokens For Multimodal Large Language Models, by Yue Zhang et al.
Prompt-Aware Adapter: Towards Learning Adaptive Visual Tokens for Multimodal Large Language Modelsby Yue Zhang, Hehe…
Prompt-Aware Adapter: Towards Learning Adaptive Visual Tokens for Multimodal Large Language Modelsby Yue Zhang, Hehe…
LOVA3: Learning to Visual Question Answering, Asking and Assessmentby Henry Hengyuan Zhao, Pan Zhou, Difei…
SearchLVLMs: A Plug-and-Play Framework for Augmenting Large Vision-Language Models by Searching Up-to-Date Internet Knowledgeby Chuanhao…
Efficient Medical Question Answering with Knowledge-Augmented Question Generationby Julien Khlaut, Corentin Dancette, Elodie Ferreres, Alaedine…
MetaReflection: Learning Instructions for Language Agents using Past Reflectionsby Priyanshu Gupta, Shashank Kirtania, Ananya Singha,…
Inquire, Interact, and Integrate: A Proactive Agent Collaborative Framework for Zero-Shot Multimodal Medical Reasoningby Zishan…
Increasing the LLM Accuracy for Question Answering: Ontologies to the Rescue!by Dean Allemang, Juan SequedaFirst…
EyeFound: A Multimodal Generalist Foundation Model for Ophthalmic Imagingby Danli Shi, Weiyi Zhang, Xiaolan Chen,…
Efficient Multimodal Large Language Models: A Surveyby Yizhang Jin, Jian Li, Yexin Liu, Tianjun Gu,…
Surgical-LVLM: Learning to Adapt Large Vision-Language Model for Grounded Visual Question Answering in Robotic Surgeryby…