Summary of Diff-2-in-1: Bridging Generation and Dense Perception with Diffusion Models, by Shuhong Zheng et al.
Diff-2-in-1: Bridging Generation and Dense Perception with Diffusion Modelsby Shuhong Zheng, Zhipeng Bao, Ruoyu Zhao,…
Diff-2-in-1: Bridging Generation and Dense Perception with Diffusion Modelsby Shuhong Zheng, Zhipeng Bao, Ruoyu Zhao,…
Customized Multiple Clustering via Multi-Modal Subspace Proxy Learningby Jiawei Yao, Qi Qian, Juhua HuFirst submitted…
Understanding Self-Supervised Learning via Gaussian Mixture Modelsby Parikshit Bansal, Ali Kavis, Sujay SanghaviFirst submitted to…
On the Comparison between Multi-modal and Single-modal Contrastive Learningby Wei Huang, Andi Han, Yongqiang Chen,…
Multi-modal biometric authentication: Leveraging shared layer architectures for enhanced securityby Vatchala S, Yogesh C, Yeshwanth…
Exploring Multi-Modality Dynamics: Insights and Challenges in Multimodal Fusion for Biomedical Tasksby Laura WenderothFirst submitted…
Preserving Pre-trained Representation Space: On Effectiveness of Prefix-tuning for Large Multi-modal Modelsby Donghoon Kim, Gusang…
Aligning Audio-Visual Joint Representations with an Agentic Workflowby Shentong Mo, Yibing SongFirst submitted to arxiv…
EMMA: End-to-End Multimodal Model for Autonomous Drivingby Jyh-Jing Hwang, Runsheng Xu, Hubert Lin, Wei-Chih Hung,…
VisAidMath: Benchmarking Visual-Aided Mathematical Reasoningby Jingkun Ma, Runzhe Zhan, Derek F. Wong, Yang Li, Di…