Summary of Exploring Efficient Foundational Multi-modal Models For Video Summarization, by Karan Samel et al.
Exploring Efficient Foundational Multi-modal Models for Video Summarizationby Karan Samel, Apoorva Beedu, Nitish Sontakke, Irfan…
Exploring Efficient Foundational Multi-modal Models for Video Summarizationby Karan Samel, Apoorva Beedu, Nitish Sontakke, Irfan…
EMMA: Empowering Multi-modal Mamba with Structural and Hierarchical Alignmentby Yifei Xing, Xiangyuan Lan, Ruiping Wang,…
On Instruction-Finetuning Neural Machine Translation Modelsby Vikas Raunak, Roman Grundkiewicz, Marcin Junczys-DowmuntFirst submitted to arxiv…
Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionalityby Youngtaek Oh, Jae Won Cho,…
CalliffusionV2: Personalized Natural Calligraphy Generation with Flexible Multi-modal Controlby Qisheng Liao, Liang Li, Yulang Fei,…
Image First or Text First? Optimising the Sequencing of Modalities in Large Language Model Prompting…
FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Modelsby Zhipei Xu, Xuanyu…
Multimodal Auto Validation For Self-Refinement in Web Agentsby Ruhana Azam, Tamer Abuelsaad, Aditya Vempaty, Ashish…
Efficient Driving Behavior Narration and Reasoning on Edge Device Using Large Language Modelsby Yizhou Huang,…
ACE: All-round Creator and Editor Following Instructions via Diffusion Transformerby Zhen Han, Zeyinzi Jiang, Yulin…