Grounding – Page 3 – GrooveSquid.com

July 13, 2025

A Simple and Effective Temporal Grounding Pipeline for Basketball Broadcast Footageby Levi HarrisFirst submitted to…

July 13, 2025

Parameter-Efficient Fine-Tuning Medical Multimodal Large Language Models for Medical Visual Groundingby Jinlong He, Pengfei Li,…

July 13, 2025

TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuningby Xiangyu Zeng, Kunchang Li, Chenting…

July 13, 2025

LocateBench: Evaluating the Locating Ability of Vision Language Modelsby Ting-Rui Chiang, Joshua Robinson, Xinyan Velocity…

July 13, 2025

SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluationby Jingxuan Chen, Derek Yuen, Bin Xie, Yuhao…

July 13, 2025

AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agentsby Ke Yang, Yao Liu, Sapana…

July 13, 2025

Can Medical Vision-Language Pre-training Succeed with Purely Synthetic Data?by Che Liu, Zhongwei Wan, Haozhe Wang,…

July 13, 2025

Large Language Models and the Rationalist Empiricist Debateby David KingFirst submitted to arxiv on: 16…

July 13, 2025

VidEgoThink: Assessing Egocentric Video Understanding Capabilities for Embodied AIby Sijie Cheng, Kechen Fang, Yangyang Yu,…

July 13, 2025

ConceptAgent: LLM-Driven Precondition Grounding and Tree Search for Robust Task Planning and Executionby Corban Rivera,…