Summary of Tap-vl: Text Layout-aware Pre-training For Enriched Vision-language Models, by Jonathan Fhima et al.
TAP-VL: Text Layout-Aware Pre-training for Enriched Vision-Language Modelsby Jonathan Fhima, Elad Ben Avraham, Oren Nuriel,…
TAP-VL: Text Layout-Aware Pre-training for Enriched Vision-Language Modelsby Jonathan Fhima, Elad Ben Avraham, Oren Nuriel,…
CaPo: Cooperative Plan Optimization for Efficient Embodied Multi-Agent Cooperationby Jie Liu, Pan Zhou, Yingjun Du,…
Solving Generalized Grouping Problems in Cellular Manufacturing Systems Using a Network Flow Modelby Md. Kutub…
QUILL: Quotation Generation Enhancement of Large Language Modelsby Jin Xiao, Bowei Zhang, Qianyu He, Jiaqing…
Touchstone Benchmark: Are We on the Right Way for Evaluating AI Algorithms for Medical Segmentation?by…
MetaSSC: Enhancing 3D Semantic Scene Completion for Autonomous Driving through Meta-Learning and Long-sequence Modelingby Yansong…
Relation Learning and Aggregate-attention for Multi-person Motion Predictionby Kehua Qu, Rui Ding, Jin TangFirst submitted…
Fine-Tuning Vision-Language Model for Automated Engineering Drawing Information Extractionby Muhammad Tayyab Khan, Lequn Chen, Ye…
Automating Exploratory Proteomics Research via Language Modelsby Ning Ding, Shang Qu, Linhai Xie, Yifei Li,…
GS2Pose: Two-stage 6D Object Pose Estimation Guided by Gaussian Splattingby Jilan Mei, Junbo Li, Cai…