Summary of Texthawk2: a Large Vision-language Model Excels in Bilingual Ocr and Grounding with 16x Fewer Tokens, by Ya-qi Yu et al.
TextHawk2: A Large Vision-Language Model Excels in Bilingual OCR and Grounding with 16x Fewer Tokensby…
TextHawk2: A Large Vision-Language Model Excels in Bilingual OCR and Grounding with 16x Fewer Tokensby…
Correlation-Aware Select and Merge Attention for Efficient Fine-Tuning and Context Length Extensionby Ning Wang, Zekun…
PAD: Personalized Alignment of LLMs at Decoding-Timeby Ruizhe Chen, Xiaotian Zhang, Meng Luo, Wenhao Chai,…
LANTERN: Accelerating Visual Autoregressive Models with Relaxed Speculative Decodingby Doohyuk Jang, Sihwan Park, June Yong…
Unveiling Language Skills via Path-Level Circuit Discoveryby Hang Chen, Jiaying Zhu, Xinyu Yang, Wenya WangFirst…
Contrastive Token Learning with Similarity Decay for Repetition Suppression in Machine Translationby Huangyu Dai, Ben…
1 Trillion Token (1TT) Platform: A Novel Framework for Efficient Data Sharing and Compensation in…
See then Tell: Enhancing Key Information Extraction with Vision Groundingby Shuhang Liu, Zhenrong Zhang, Pengfei…
Enhancing elusive clues in knowledge learning by contrasting attention of language modelsby Jian Gao, Xiao…
Inference-Time Language Model Alignment via Integrated Value Guidanceby Zhixuan Liu, Zhanhui Zhou, Yuanfu Wang, Chao…