Summary of Vimts: a Unified Video and Image Text Spotter For Enhancing the Cross-domain Generalization, by Yuliang Liu et al.
VimTS: A Unified Video and Image Text Spotter for Enhancing the Cross-domain Generalizationby Yuliang Liu,…
VimTS: A Unified Video and Image Text Spotter for Enhancing the Cross-domain Generalizationby Yuliang Liu,…
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Expertsby Dengchun Li, Yingzi Ma,…
Grasper: A Generalist Pursuer for Pursuit-Evasion Problemsby Pengdeng Li, Shuxin Li, Xinrun Wang, Jakub Cerny,…
CrossIn: An Efficient Instruction Tuning Approach for Cross-Lingual Knowledge Alignmentby Geyu Lin, Bin Wang, Zhengyuan…
AdaIR: Exploiting Underlying Similarities of Image Restoration Tasks with Adaptersby Hao-Wei Chen, Yu-Syuan Xu, Kelvin…
Rethinking 3D Dense Caption and Visual Grounding in A Unified Framework through Prompt-based Localizationby Yongdong…
Negation Triplet Extraction with Syntactic Dependency and Semantic Consistencyby Yuchen Shi, Deqing Yang, Jingping Liu,…
Interplay of Machine Translation, Diacritics, and Diacritizationby Wei-Rui Chen, Ife Adebara, Muhammad Abdul-MageedFirst submitted to…
Panoptic Perception: A Novel Task and Fine-grained Dataset for Universal Remote Sensing Image Interpretationby Danpei…
WavLLM: Towards Robust and Adaptive Speech Large Language Modelby Shujie Hu, Long Zhou, Shujie Liu,…