Grounding – Page 4 – GrooveSquid.com

July 13, 2025

Grounding is All You Need? Dual Temporal Grounding for Video Dialogby You Qin, Wei Ji,…

July 13, 2025

Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agentsby Boyu Gou,…

July 13, 2025

TextHawk2: A Large Vision-Language Model Excels in Bilingual OCR and Grounding with 16x Fewer Tokensby…

July 13, 2025

Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Modelsby Haibo Wang, Zhiyang Xu, Yu…

July 13, 2025

Adaptive Masking Enhances Visual Groundingby Sen Jia, Lei LiFirst submitted to arxiv on: 4 Oct…

July 13, 2025

From Concrete to Abstract: A Multimodal Generative Approach to Abstract Concept Learningby Haodong Xie, Rahul…

July 13, 2025

Choices are More Important than Efforts: LLM Enables Efficient Multi-Agent Explorationby Yun Qu, Boyuan Wang,…

July 13, 2025

Robin3D: Improving 3D Large Language Model via Robust Instruction Tuningby Weitai Kang, Haifeng Huang, Yuzhang…

July 13, 2025

Learning to Ground Existentially Quantified Goalsby Martin Funkquist, Simon Ståhlberg, Hector GeffnerFirst submitted to arxiv…

July 13, 2025

World to Code: Multi-modal Data Generation via Self-Instructed Compositional Captioning and Filteringby Jiacong Wang, Bohong…