Summary of Learning Content-aware Multi-modal Joint Input Pruning Via Bird’s-eye-view Representation, by Yuxin Li et al.
Learning Content-Aware Multi-Modal Joint Input Pruning via Bird’s-Eye-View Representationby Yuxin Li, Yiheng Li, Xulei Yang,…
Learning Content-Aware Multi-Modal Joint Input Pruning via Bird’s-Eye-View Representationby Yuxin Li, Yiheng Li, Xulei Yang,…
EMMA: Empowering Multi-modal Mamba with Structural and Hierarchical Alignmentby Yifei Xing, Xiangyuan Lan, Ruiping Wang,…
On Instruction-Finetuning Neural Machine Translation Modelsby Vikas Raunak, Roman Grundkiewicz, Marcin Junczys-DowmuntFirst submitted to arxiv…
Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionalityby Youngtaek Oh, Jae Won Cho,…
CalliffusionV2: Personalized Natural Calligraphy Generation with Flexible Multi-modal Controlby Qisheng Liao, Liang Li, Yulang Fei,…
Image First or Text First? Optimising the Sequencing of Modalities in Large Language Model Prompting…
FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Modelsby Zhipei Xu, Xuanyu…
Multimodal Auto Validation For Self-Refinement in Web Agentsby Ruhana Azam, Tamer Abuelsaad, Aditya Vempaty, Ashish…
ACE: All-round Creator and Editor Following Instructions via Diffusion Transformerby Zhen Han, Zeyinzi Jiang, Yulin…
Efficient Driving Behavior Narration and Reasoning on Edge Device Using Large Language Modelsby Yizhou Huang,…