Summary of Video-rag: Visually-aligned Retrieval-augmented Long Video Comprehension, by Yongdong Luo et al.
Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehensionby Yongdong Luo, Xiawu Zheng, Xiao Yang, Guilin Li, Haojia…
Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehensionby Yongdong Luo, Xiawu Zheng, Xiao Yang, Guilin Li, Haojia…
WoodYOLO: A Novel Object Detector for Wood Species Detection in Microscopic Imagesby Lars Nieradzik, Henrike…
Real-Time AI-Driven People Tracking and Counting Using Overhead Camerasby Ishrath Ahamed, Chamith Dilshan Ranathunga, Dinuka…
Local-Global Attention: An Adaptive Mechanism for Multi-Scale Feature Integrationby Yifan ShaoFirst submitted to arxiv on:…
Multimodal Object Detection using Depth and Image Data for Manufacturing Partsby Nazanin Mahjourian, Vinh NguyenFirst…
LEAP:D – A Novel Prompt-based Approach for Domain-Generalized Aerial Object Detectionby Chanyeong Park, Heegwang Kim,…
AI-Compass: A Comprehensive and Effective Multi-module Testing Tool for AI Systemsby Zhiyu Zhu, Zhibo Jin,…
An Empirical Analysis on Spatial Reasoning Capabilities of Large Multimodal Modelsby Fatemeh Shiri, Xiao-Yu Guo,…
Integrating Object Detection Modality into Visual Language Model for Enhanced Autonomous Driving Agentby Linfeng He,…
SimpleBEV: Improved LiDAR-Camera Fusion Architecture for 3D Object Detectionby Yun Zhao, Zhan Gong, Peiru Zheng,…