Summary of Topa: Extending Large Language Models For Video Understanding Via Text-only Pre-alignment, by Wei Li et al.
TOPA: Extending Large Language Models for Video Understanding via Text-Only Pre-Alignmentby Wei Li, Hehe Fan,…
TOPA: Extending Large Language Models for Video Understanding via Text-Only Pre-Alignmentby Wei Li, Hehe Fan,…
Evaluating and Modeling Social Intelligence: A Comparative Study of Human and AI Capabilitiesby Junqi Wang,…
Unveiling and Manipulating Prompt Influence in Large Language Modelsby Zijian Feng, Hanzhang Zhou, Zixiao Zhu,…
Revisiting the Robust Generalization of Adversarial Prompt Tuningby Fan Yang, Mingxuan Xia, Sangzhou Xia, Chicheng…
EyeFound: A Multimodal Generalist Foundation Model for Ophthalmic Imagingby Danli Shi, Weiyi Zhang, Xiaolan Chen,…
HecVL: Hierarchical Video-Language Pretraining for Zero-shot Surgical Phase Recognitionby Kun Yuan, Vinkle Srivastav, Nassir Navab,…
GPT-3.5 for Grammatical Error Correctionby Anisia Katinskaia, Roman YangarberFirst submitted to arxiv on: 14 May…
FreeVA: Offline MLLM as Training-Free Video Assistantby Wenhao WuFirst submitted to arxiv on: 13 May…
Investigating the Semantic Robustness of CLIP-based Zero-Shot Anomaly Segmentationby Kevin Stangl, Marius Arvinte, Weilin Xu,…
Advanced Natural-based interaction for the ITAlian language: LLaMAntino-3-ANITAby Marco Polignano, Pierpaolo Basile, Giovanni SemeraroFirst submitted…