Summary of A Simple and Effective Temporal Grounding Pipeline For Basketball Broadcast Footage, by Levi Harris
A Simple and Effective Temporal Grounding Pipeline for Basketball Broadcast Footageby Levi HarrisFirst submitted to…
A Simple and Effective Temporal Grounding Pipeline for Basketball Broadcast Footageby Levi HarrisFirst submitted to…
Parameter-Efficient Fine-Tuning Medical Multimodal Large Language Models for Medical Visual Groundingby Jinlong He, Pengfei Li,…
LocateBench: Evaluating the Locating Ability of Vision Language Modelsby Ting-Rui Chiang, Joshua Robinson, Xinyan Velocity…
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuningby Xiangyu Zeng, Kunchang Li, Chenting…
SPA-Bench: A Comprehensive Benchmark for SmartPhone Agent Evaluationby Jingxuan Chen, Derek Yuen, Bin Xie, Yuhao…
AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agentsby Ke Yang, Yao Liu, Sapana…
Can Medical Vision-Language Pre-training Succeed with Purely Synthetic Data?by Che Liu, Zhongwei Wan, Haozhe Wang,…
Large Language Models and the Rationalist Empiricist Debateby David KingFirst submitted to arxiv on: 16…
VidEgoThink: Assessing Egocentric Video Understanding Capabilities for Embodied AIby Sijie Cheng, Kechen Fang, Yangyang Yu,…
ConceptAgent: LLM-Driven Precondition Grounding and Tree Search for Robust Task Planning and Executionby Corban Rivera,…