Summary of An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering Using a Vlm, by Wonkyun Kim et al.
An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering Using a VLMby…
An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering Using a VLMby…
Few-Shot Recalibration of Language Modelsby Xiang Lisa Li, Urvashi Khandelwal, Kelvin GuuFirst submitted to arxiv…
Residual-based Language Models are Free Boosters for Biomedical Imagingby Zhixin Lai, Jing Wu, Suiyao Chen,…
If CLIP Could Talk: Understanding Vision-Language Model Representations Through Their Preferred Concept Descriptionsby Reza Esfandiarpoor,…
VidLA: Video-Language Alignment at Scaleby Mamshad Nayeem Rizve, Fan Fei, Jayakrishnan Unnikrishnan, Son Tran, Benjamin…
Integrating Wearable Sensor Data and Self-reported Diaries for Personalized Affect Forecastingby Zhongqi Yang, Yuning Wang,…
PARAMANU-AYN: Pretrain from scratch or Continual Pretraining of LLMs for Legal Domain Adaptation?by Mitodru Niyogi,…
Arceeās MergeKit: A Toolkit for Merging Large Language Modelsby Charles Goddard, Shamane Siriwardhana, Malikeh Ehghaghi,…
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compressionby Zhuoshi Pan, Qianhui Wu, Huiqiang…
Cross-Domain Pre-training with Language Models for Transferable Time Series Representationsby Mingyue Cheng, Xiaoyu Tao, Qi…