Summary of Video-rag: Visually-aligned Retrieval-augmented Long Video Comprehension, by Yongdong Luo et al.
Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehensionby Yongdong Luo, Xiawu Zheng, Xiao Yang, Guilin Li, Haojia…
Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehensionby Yongdong Luo, Xiawu Zheng, Xiao Yang, Guilin Li, Haojia…
WoodYOLO: A Novel Object Detector for Wood Species Detection in Microscopic Imagesby Lars Nieradzik, Henrike…
Real-Time AI-Driven People Tracking and Counting Using Overhead Camerasby Ishrath Ahamed, Chamith Dilshan Ranathunga, Dinuka…
Local-Global Attention: An Adaptive Mechanism for Multi-Scale Feature Integrationby Yifan ShaoFirst submitted to arxiv on:…
LEAP:D – A Novel Prompt-based Approach for Domain-Generalized Aerial Object Detectionby Chanyeong Park, Heegwang Kim,…
Multimodal Object Detection using Depth and Image Data for Manufacturing Partsby Nazanin Mahjourian, Vinh NguyenFirst…
AI-Compass: A Comprehensive and Effective Multi-module Testing Tool for AI Systemsby Zhiyu Zhu, Zhibo Jin,…
Integrating Object Detection Modality into Visual Language Model for Enhanced Autonomous Driving Agentby Linfeng He,…
An Empirical Analysis on Spatial Reasoning Capabilities of Large Multimodal Modelsby Fatemeh Shiri, Xiao-Yu Guo,…
SimpleBEV: Improved LiDAR-Camera Fusion Architecture for 3D Object Detectionby Yun Zhao, Zhan Gong, Peiru Zheng,…