Summary of Lmfusion: Adapting Pretrained Language Models For Multimodal Generation, by Weijia Shi et al.
LMFusion: Adapting Pretrained Language Models for Multimodal Generationby Weijia Shi, Xiaochuang Han, Chunting Zhou, Weixin…
LMFusion: Adapting Pretrained Language Models for Multimodal Generationby Weijia Shi, Xiaochuang Han, Chunting Zhou, Weixin…
Preventing Local Pitfalls in Vector Quantization via Optimal Transportby Borui Zhang, Wenzhao Zheng, Jie Zhou,…
AV-Link: Temporally-Aligned Diffusion Features for Cross-Modal Audio-Video Generationby Moayed Haji-Ali, Willi Menapace, Aliaksandr Siarohin, Ivan…
LiDAR-RT: Gaussian-based Ray Tracing for Dynamic LiDAR Re-simulationby Chenxu Zhou, Lvchang Fu, Sida Peng, Yunzhi…
AutoTrust: Benchmarking Trustworthiness in Large Vision Language Models for Autonomous Drivingby Shuo Xing, Hongyuan Hua,…
OpenEMMA: Open-Source Multimodal Model for End-to-End Autonomous Drivingby Shuo Xing, Chengyuan Qian, Yuping Wang, Hongyuan…
PRIMA: Multi-Image Vision-Language Models for Reasoning Segmentationby Muntasir Wahed, Kiet A. Nguyen, Adheesh Sunil Juvekar,…
Dipper: Diversity in Prompts for Producing Large Language Model Ensembles in Reasoning tasksby Gregory Kang…
MPPO: Multi Pair-wise Preference Optimization for LLMs with Arbitrary Negative Samplesby Shuo Xie, Fangzhi Zhu,…
LLMs for Literature Review: Are we there yet?by Shubham Agarwal, Gaurav Sahu, Abhay Puri, Issam…