Summary of Grounding Is All You Need? Dual Temporal Grounding For Video Dialog, by You Qin et al.
Grounding is All You Need? Dual Temporal Grounding for Video Dialogby You Qin, Wei Ji,…
Grounding is All You Need? Dual Temporal Grounding for Video Dialogby You Qin, Wei Ji,…
TextHawk2: A Large Vision-Language Model Excels in Bilingual OCR and Grounding with 16x Fewer Tokensby…
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agentsby Boyu Gou,…
Adaptive Masking Enhances Visual Groundingby Sen Jia, Lei LiFirst submitted to arxiv on: 4 Oct…
Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Modelsby Haibo Wang, Zhiyang Xu, Yu…
Choices are More Important than Efforts: LLM Enables Efficient Multi-Agent Explorationby Yun Qu, Boyuan Wang,…
From Concrete to Abstract: A Multimodal Generative Approach to Abstract Concept Learningby Haodong Xie, Rahul…
Learning to Ground Existentially Quantified Goalsby Martin Funkquist, Simon Ståhlberg, Hector GeffnerFirst submitted to arxiv…
World to Code: Multi-modal Data Generation via Self-Instructed Compositional Captioning and Filteringby Jiacong Wang, Bohong…
Robin3D: Improving 3D Large Language Model via Robust Instruction Tuningby Weitai Kang, Haifeng Huang, Yuzhang…