Summary of Iiu: Independent Inference Units For Knowledge-based Visual Question Answering, by Yili Li et al.
IIU: Independent Inference Units for Knowledge-based Visual Question Answeringby Yili Li, Jing Yu, Keke Gai,…
IIU: Independent Inference Units for Knowledge-based Visual Question Answeringby Yili Li, Jing Yu, Keke Gai,…
Social Debiasing for Fair Multi-modal LLMsby Harry Cheng, Yangyang Guo, Qingpei Guo, Ming Yang, Tian…
Revisiting Multi-Modal LLM Evaluationby Jian Lu, Shikhar Srivastava, Junyu Chen, Robik Shrestha, Manoj Acharya, Kushal…
Disentangled Noisy Correspondence Learningby Zhuohang Dang, Minnan Luo, Jihong Wang, Chengyou Jia, Haochen Han, Herun…
Semantic Skill Grounding for Embodied Instruction-Following in Cross-Domain Environmentsby Sangwoo Shin, Seunghyun Kim, Youngsoo Jang,…
WAS: Dataset and Methods for Artistic Text Segmentationby Xudong Xie, Yuzhe Li, Yang Liu, Zhifei…
Multi-Modal Parameter-Efficient Fine-tuning via Graph Neural Networkby Bin Cheng, Jiaxuan LuFirst submitted to arxiv on:…
BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cuesby Sara Sarto, Marcella Cornia,…
Rethinking RGB-D Fusion for Semantic Segmentation in Surgical Datasetsby Muhammad Abdullah Jamal, Omid MohareriFirst submitted…
A Unified Graph Transformer for Overcoming Isolations in Multi-modal Recommendationby Zixuan Yi, Iadh OunisFirst submitted…