Summary of Long Story Short: Story-level Video Understanding From 20k Short Films, by Ridouane Ghermi et al.
Long Story Short: Story-level Video Understanding from 20K Short Filmsby Ridouane Ghermi, Xi Wang, Vicky…
Long Story Short: Story-level Video Understanding from 20K Short Filmsby Ridouane Ghermi, Xi Wang, Vicky…
SHMamba: Structured Hyperbolic State Space Model for Audio-Visual Question Answeringby Zhe Yang, Wenrui Li, Guanghui…
Vision-Language Models Meet Meteorology: Developing Models for Extreme Weather Events Detection with Heatmapsby Jian Chen,…
BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystackby Yuri Kuratov, Aydar Bulatov, Petr…
Advancing High Resolution Vision-Language Models in Biomedicineby Zekai Chen, Arda Pekis, Kevin BrownFirst submitted to…
Multi-Modal Retrieval For Large Language Model Based Speech Recognitionby Jari Kolehmainen, Aditya Gourav, Prashanth Gurunath…
A Survey of Video Datasets for Grounded Event Understandingby Kate Sanders, Benjamin Van DurmeFirst submitted…
Research Trends for the Interplay between Large Language Models and Knowledge Graphsby Hanieh Khorashadizadeh, Fatima…
SciRIFF: A Resource to Enhance Language Model Instruction-Following over Scientific Literatureby David Wadden, Kejian Shi,…
Dynamic Stochastic Decoding Strategy for Open-Domain Dialogue Generationby Yiwei Li, Fei Mi, Yitong Li, Yasheng…