Summary of Vita: Towards Open-source Interactive Omni Multimodal Llm, by Chaoyou Fu et al.
VITA: Towards Open-Source Interactive Omni Multimodal LLMby Chaoyou Fu, Haojia Lin, Zuwei Long, Yunhang Shen,…
VITA: Towards Open-Source Interactive Omni Multimodal LLMby Chaoyou Fu, Haojia Lin, Zuwei Long, Yunhang Shen,…
KnowPO: Knowledge-aware Preference Optimization for Controllable Knowledge Selection in Retrieval-Augmented Language Modelsby Ruizhe Zhang, Yongxin…
Self-Introspective Decoding: Alleviating Hallucinations for Large Vision-Language Modelsby Fushuo Huo, Wenchao Xu, Zhong Zhang, Haozhao…
Unleashing the Power of Data Tsunami: A Comprehensive Survey on Data Assessment and Selection for…
Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruction Tuningby Xingchen Zeng,…
Neurosymbolic AI for Enhancing Instructability in Generative AIby Amit Sheth, Vishal Pallagani, Kaushik RoyFirst submitted…
Cost-effective Instruction Learning for Pathology Vision and Language Analysisby Kaitao Chen, Mianxin Liu, Fang Yan,…
On Pre-training of Multimodal Language Models Customized for Chart Understandingby Wan-Cyuan Fan, Yen-Chun Chen, Mengchen…
SwitchCIT: Switching for Continual Instruction Tuningby Xinbo Wu, Max Hartman, Vidhata Arjun Jayaraman, Lav R.…
ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generationby Ethan Chern, Jiadi…