Summary of Enhancing Perception Capabilities Of Multimodal Llms with Training-free Fusion, by Zhuokun Chen et al.
Enhancing Perception Capabilities of Multimodal LLMs with Training-Free Fusionby Zhuokun Chen, Jinwu Hu, Zeshuai Deng,…
Enhancing Perception Capabilities of Multimodal LLMs with Training-Free Fusionby Zhuokun Chen, Jinwu Hu, Zeshuai Deng,…
Learn to Unlearn: Meta-Learning-Based Knowledge Graph Embedding Unlearningby Naixing Xu, Qian Li, Xu Wang, Bingchen…
DLaVA: Document Language and Vision Assistant for Answer Localization with Enhanced Interpretability and Trustworthinessby Ahmad…
TQA-Bench: Evaluating LLMs for Multi-Table Question Answering with Scalable Context and Symbolic Extensionby Zipeng Qiu,…
Cross-modal Information Flow in Multimodal Large Language Modelsby Zhi Zhang, Srishti Yadav, Fengze Han, Ekaterina…
g3D-LF: Generalizable 3D-Language Feature Fields for Embodied Tasksby Zihan Wang, Gim Hee LeeFirst submitted to…
GEMeX: A Large-Scale, Groundable, and Explainable Medical VQA Benchmark for Chest X-ray Diagnosisby Bo Liu,…
freePruner: A Training-free Approach for Large Multimodal Model Accelerationby Bingxin Xu, Yuzhang Shang, Yunhao Ge,…
ReWind: Understanding Long Videos with Instructed Learnable Memoryby Anxhelo Diko, Tinghuai Wang, Wassim Swaileh, Shiyan…
PPLqa: An Unsupervised Information-Theoretic Quality Metric for Comparing Generative Large Language Modelsby Gerald Friedland, Xin…