Summary of Relational Programming with Foundation Models, by Ziyang Li et al.
Relational Programming with Foundation Modelsby Ziyang Li, Jiani Huang, Jason Liu, Felix Zhu, Eric Zhao,…
Relational Programming with Foundation Modelsby Ziyang Li, Jiani Huang, Jason Liu, Felix Zhu, Eric Zhao,…
Query-centric Audio-Visual Cognition Network for Moment Retrieval, Segmentation and Step-Captioningby Yunbin Tu, Liang Li, Li…
CoMT: A Novel Benchmark for Chain of Multi-modal Thought on Large Vision-Language Modelsby Zihui Cheng,…
From An LLM Swarm To A PDDL-Empowered HIVE: Planning Self-Executed Instructions In A Multi-Modal Jungleby…
Multi-modal and Multi-scale Spatial Environment Understanding for Immersive Visual Text-to-Speechby Rui Liu, Shuwei He, Yifan…
Distribution-Consistency-Guided Multi-modal Hashingby Jin-Yu Liu, Xian-Ling Mao, Tian-Yi Che, Rong-Cheng TuFirst submitted to arxiv on:…
Geo-LLaVA: A Large Multi-Modal Model for Solving Geometry Math Problems with Meta In-Context Learningby Shihao…
TANGO: Training-free Embodied AI Agents for Open-world Tasksby Filippo Ziliotto, Tommaso Campari, Luciano Serafini, Lamberto…
Visual Object Tracking across Diverse Data Modalities: A Reviewby Mengmeng Wang, Teli Ma, Shuo Xin,…
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactionsby Pan Zhang, Xiaoyi…