Summary of Seqafford: Sequential 3d Affordance Reasoning Via Multimodal Large Language Model, by Chunlin Yu et al.
SeqAfford: Sequential 3D Affordance Reasoning via Multimodal Large Language Modelby Chunlin Yu, Hanqing Wang, Ye…
SeqAfford: Sequential 3D Affordance Reasoning via Multimodal Large Language Modelby Chunlin Yu, Hanqing Wang, Ye…
Evaluating Automated Radiology Report Quality through Fine-Grained Phrasal Grounding of Clinical Findingsby Razi Mahmood, Pingkun…
DLaVA: Document Language and Vision Assistant for Answer Localization with Enhanced Interpretability and Trustworthinessby Ahmad…
GREAT: Geometry-Intention Collaborative Inference for Open-Vocabulary 3D Object Affordance Groundingby Yawen Shao, Wei Zhai, Yuhang…
ShowUI: One Vision-Language-Action Model for GUI Visual Agentby Kevin Qinghong Lin, Linjie Li, Difei Gao,…
LLM-Based Offline Learning for Embodied Agents via Consistency-Guided Reward Ensembleby Yujeong Lee, Sangwoo Shin, Wei-Jin…
DOGE: Towards Versatile Visual Document Grounding and Referringby Yinan Zhou, Yuxin Chen, Haokun Lin, Shuyu…
ReWind: Understanding Long Videos with Instructed Learnable Memoryby Anxhelo Diko, Tinghuai Wang, Wassim Swaileh, Shiyan…
Improved GUI Grounding via Iterative Narrowingby Anthony NguyenFirst submitted to arxiv on: 18 Nov 2024CategoriesMain:…
Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Levelby Andong Deng, Tongjia Chen, Shoubin…