Summary of Efficient Vision-and-language Pre-training with Text-relevant Image Patch Selection, by Wei Ye et al.
Efficient Vision-and-Language Pre-training with Text-Relevant Image Patch Selectionby Wei Ye, Chaoya Jiang, Haiyang Xu, Chenhao…
Efficient Vision-and-Language Pre-training with Text-Relevant Image Patch Selectionby Wei Ye, Chaoya Jiang, Haiyang Xu, Chenhao…
VLKEB: A Large Vision-Language Model Knowledge Editing Benchmarkby Han Huang, Haitian Zhong, Tao Yu, Qiang…
Noise-powered Multi-modal Knowledge Graph Representation Frameworkby Zhuo Chen, Yin Fang, Yichi Zhang, Lingbing Guo, Jiaoyan…
MOAB: Multi-Modal Outer Arithmetic Block For Fusion Of Histopathological Images And Genetic Data For Brain…
How Well Do Multi-modal LLMs Interpret CT Scans? An Auto-Evaluation Framework for Analysesby Qingqing Zhu,…
MMoE: Robust Spoiler Detection with Multi-modal Information and Domain-aware Mixture-of-Expertsby Zinan Zeng, Sen Ye, Zijian…
A Privacy-Preserving Framework with Multi-Modal Data for Cross-Domain Recommendationby Li Wang, Lei Sang, Quangui Zhang,…
Data Augmentation using Large Language Models: Data Perspectives, Learning Paradigms and Challengesby Bosheng Ding, Chengwei…
Abductive Ego-View Accident Video Understanding for Safe Driving Perceptionby Jianwu Fang, Lei-lei Li, Junfei Zhou,…
GenAINet: Enabling Wireless Collective Intelligence via Knowledge Transfer and Reasoningby Hang Zou, Qiyang Zhao, Lina…