Summary of Vidctx: Context-aware Video Question Answering with Image Models, by Andreas Goulas et al.
VidCtx: Context-aware Video Question Answering with Image Modelsby Andreas Goulas, Vasileios Mezaris, Ioannis PatrasFirst submitted…
VidCtx: Context-aware Video Question Answering with Image Modelsby Andreas Goulas, Vasileios Mezaris, Ioannis PatrasFirst submitted…
Prompting Large Language Models with Rationale Heuristics for Knowledge-based Visual Question Answeringby Zhongjian Hu, Peng…
STAMPsy: Towards SpatioTemporal-Aware Mixed-Type Dialogues for Psychological Counselingby Jieyi Wang, Yue Huang, Zeming Liu, Dexuan…
AgentPS: Agentic Process Supervision for Multi-modal Content Quality Assurance through Multi-round QAby Gorden Liu, Yu…
On the Structural Memory of LLM Agentsby Ruihong Zeng, Jinyuan Fang, Siwei Liu, Zaiqiao MengFirst…
SimGRAG: Leveraging Similar Subgraphs for Knowledge Graphs Driven Retrieval-Augmented Generationby Yuzheng Cai, Zhenyue Guo, Yiwen…
FiVL: A Framework for Improved Vision-Language Alignment through the Lens of Training, Evaluation and Explainabilityby…
Multi-OphthaLingua: A Multilingual Benchmark for Assessing and Debiasing LLM Ophthalmological QA in LMICsby David Restrepo,…
Consistency of Compositional Generalization across Multiple Levelsby Chuanhao Li, Zhen Li, Chenchen Jing, Xiaomeng Fan,…
What External Knowledge is Preferred by LLMs? Characterizing and Exploring Chain of Evidence in Imperfect…