Summary of Multi-modal Interpretable Automatic Video Captioning, by Antoine Hanna-asaad et al.
Multi-Modal interpretable automatic video captioningby Antoine Hanna-Asaad, Decky Aspandi, Titus ZahariaFirst submitted to arxiv on:…
Multi-Modal interpretable automatic video captioningby Antoine Hanna-Asaad, Decky Aspandi, Titus ZahariaFirst submitted to arxiv on:…
Multi-modal Iterative and Deep Fusion Frameworks for Enhanced Passive DOA Sensing via a Green Massive…
Multi-Modal Forecaster: Jointly Predicting Time Series and Textual Databy Kai Kim, Howard Tsai, Rajat Sen,…
Personalize to generalize: Towards a universal medical multi-modality generalization through personalizationby Zhaorui Tan, Xi Yang,…
LLM-PySC2: Starcraft II learning environment for Large Language Modelsby Zongyuan Li, Yanan Ni, Runnan Qi,…
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understandingby Jaemin Cho, Debanjan Mahata,…
M3SciQA: A Multi-Modal Multi-Document Scientific QA Benchmark for Evaluating Foundation Modelsby Chuhan Li, Ziyao Shangguan,…
Enhancing Indoor Mobility with Connected Sensor Nodes: A Real-Time, Delay-Aware Cooperative Perception Approachby Minghao Ning,…
Enhancing Osteoporosis Detection: An Explainable Multi-Modal Learning Framework with Feature Fusion and Variable Clusteringby Mehdi…
A Simple and Effective Temporal Grounding Pipeline for Basketball Broadcast Footageby Levi HarrisFirst submitted to…