Summary of Baichuan-omni Technical Report, by Yadong Li et al.
Baichuan-Omni Technical Reportby Yadong Li, Haoze Sun, Mingan Lin, Tianpeng Li, Guosheng Dong, Tao Zhang,…
Baichuan-Omni Technical Reportby Yadong Li, Haoze Sun, Mingan Lin, Tianpeng Li, Guosheng Dong, Tao Zhang,…
Cross-Modal Bidirectional Interaction Model for Referring Remote Sensing Image Segmentationby Zhe Dong, Yuzhe Sun, Yanfeng…
GrabDAE: An Innovative Framework for Unsupervised Domain Adaptation Utilizing Grab-Mask and Denoise Auto-Encoderby Junzhou Chen,…
Exploring Efficient Foundational Multi-modal Models for Video Summarizationby Karan Samel, Apoorva Beedu, Nitish Sontakke, Irfan…
MACPO: Weak-to-Strong Alignment via Multi-Agent Contrastive Preference Optimizationby Yougang Lyu, Lingyong Yan, Zihan Wang, Dawei…
Uncovering Factor Level Preferences to Improve Human-Model Alignmentby Juhyun Oh, Eunsu Kim, Jiseon Kim, Wenda…
Better Language Models Exhibit Higher Visual Alignmentby Jona Ruthardt, Gertjan J. Burghouts, Serge Belongie, Yuki…
MEXA: Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignmentby Amir Hossein Kargaran, Ali Modarressi, Nafiseh…
EMMA: Empowering Multi-modal Mamba with Structural and Hierarchical Alignmentby Yifei Xing, Xiangyuan Lan, Ruiping Wang,…
T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance Designby Jiachen Li,…