Summary of Iiu: Independent Inference Units For Knowledge-based Visual Question Answering, by Yili Li et al.
IIU: Independent Inference Units for Knowledge-based Visual Question Answeringby Yili Li, Jing Yu, Keke Gai,…
IIU: Independent Inference Units for Knowledge-based Visual Question Answeringby Yili Li, Jing Yu, Keke Gai,…
Social Debiasing for Fair Multi-modal LLMsby Harry Cheng, Yangyang Guo, Qingpei Guo, Ming Yang, Tian…
Disentangled Noisy Correspondence Learningby Zhuohang Dang, Minnan Luo, Jihong Wang, Chengyou Jia, Haochen Han, Herun…
Revisiting Multi-Modal LLM Evaluationby Jian Lu, Shikhar Srivastava, Junyu Chen, Robik Shrestha, Manoj Acharya, Kushal…
Semantic Skill Grounding for Embodied Instruction-Following in Cross-Domain Environmentsby Sangwoo Shin, Seunghyun Kim, Youngsoo Jang,…
Multi-Modal Parameter-Efficient Fine-tuning via Graph Neural Networkby Bin Cheng, Jiaxuan LuFirst submitted to arxiv on:…
WAS: Dataset and Methods for Artistic Text Segmentationby Xudong Xie, Yuzhe Li, Yang Liu, Zhifei…
BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cuesby Sara Sarto, Marcella Cornia,…
A Unified Graph Transformer for Overcoming Isolations in Multi-modal Recommendationby Zixuan Yi, Iadh OunisFirst submitted…
Rethinking RGB-D Fusion for Semantic Segmentation in Surgical Datasetsby Muhammad Abdullah Jamal, Omid MohareriFirst submitted…