Summary of Freeedit: Mask-free Reference-based Image Editing with Multi-modal Instruction, by Runze He et al.
FreeEdit: Mask-free Reference-based Image Editing with Multi-modal Instructionby Runze He, Kai Ma, Linjiang Huang, Shaofei…
FreeEdit: Mask-free Reference-based Image Editing with Multi-modal Instructionby Runze He, Kai Ma, Linjiang Huang, Shaofei…
HA-FGOVD: Highlighting Fine-grained Attributes via Explicit Linear Composition for Open-Vocabulary Object Detectionby Yuqi Ma, Mengyin…
AsthmaBot: Multi-modal, Multi-Lingual Retrieval Augmented Generation For Asthma Patient Supportby Adil Bahaj, Mounir GhoghoFirst submitted…
Brotherhood at WMT 2024: Leveraging LLM-Generated Contextual Conversations for Cross-Lingual Image Captioningby Siddharth Betala, Ishan…
Multi-Modal Generative AI: Multi-modal LLM, Diffusion and Beyondby Hong Chen, Xin Wang, Yuwei Zhou, Bin…
UniBEVFusion: Unified Radar-Vision BEVFusion for 3D Object Detectionby Haocheng Zhao, Runwei Guan, Taoyu Wu, Ka…
Advancing Molecular Graph-Text Pre-training via Fine-grained Alignmentby Yibo Li, Yuan Fang, Mengmei Zhang, Chuan ShiFirst…
Enhancing Advanced Visual Reasoning Ability of Large Language Modelsby Zhiyuan Li, Dongnan Liu, Chaoyi Zhang,…
MathGLM-Vision: Solving Mathematical Problems with Multi-Modal Large Language Modelby Zhen Yang, Jinhao Chen, Zhengxiao Du,…
VisScience: An Extensive Benchmark for Evaluating K12 Educational Multi-modal Scientific Reasoningby Zhihuan Jiang, Zhen Yang,…