Summary of The Power Of Many: Multi-agent Multimodal Models For Cultural Image Captioning, by Longju Bai et al.
The Power of Many: Multi-Agent Multimodal Models for Cultural Image Captioningby Longju Bai, Angana Borah,…
The Power of Many: Multi-Agent Multimodal Models for Cultural Image Captioningby Longju Bai, Angana Borah,…
Learn from Downstream and Be Yourself in Multimodal Large Language Model Fine-Tuningby Wenke Huang, Jian…
RS-MoE: A Vision-Language Model with Mixture of Experts for Remote Sensing Image Captioning and Visual…
Nearest Neighbor Normalization Improves Multimodal Retrievalby Neil Chowdhury, Franklin Wang, Sumedh Shenoy, Douwe Kiela, Sarah…
Large Language Model Benchmarks in Medical Tasksby Lawrence K.Q. Yan, Qian Niu, Ming Li, Yichao…
A Unified Debiasing Approach for Vision-Language Models across Modalities and Tasksby Hoin Jung, Taeuk Jang,…
Core Tokensets for Data-efficient Sequential Training of Transformersby Subarnaduti Paul, Manuel Brack, Patrick Schramowski, Kristian…
CAPEEN: Image Captioning with Early Exits and Knowledge Distillationby Divya Jyoti Bajpai, Manjesh Kumar HanawalFirst…
DENEB: A Hallucination-Robust Automatic Evaluation Metric for Image Captioningby Kazuki Matsuda, Yuiga Wada, Komei SugiuraFirst…
Attention Prompting on Image for Large Vision-Language Modelsby Runpeng Yu, Weihao Yu, Xinchao WangFirst submitted…