Summary of Texthawk: Exploring Efficient Fine-grained Perception Of Multimodal Large Language Models, by Ya-qi Yu et al.
TextHawk: Exploring Efficient Fine-Grained Perception of Multimodal Large Language Modelsby Ya-Qi Yu, Minghui Liao, Jihao…
TextHawk: Exploring Efficient Fine-Grained Perception of Multimodal Large Language Modelsby Ya-Qi Yu, Minghui Liao, Jihao…
CATP: Cross-Attention Token Pruning for Accuracy Preserved Multimodal Model Inferenceby Ruqi Liao, Chuqing Zhao, Jin…
Automatic Quantification of Serial PET/CT Images for Pediatric Hodgkin Lymphoma Patients Using a Longitudinally-Aware Segmentation…
Playing to Vision Foundation Model’s Strengths in Stereo Matchingby Chuang-Wei Liu, Qijun Chen, Rui FanFirst…
OW-VISCapTor: Abstractors for Open-World Video Instance Segmentation and Captioningby Anwesa Choudhuri, Girish Chowdhary, Alexander G.…
Cross-domain Fiber Cluster Shape Analysis for Language Performance Cognitive Score Predictionby Yui Lo, Yuqian Chen,…
MM-Diff: High-Fidelity Image Personalization via Multi-Modal Condition Integrationby Zhichao Wei, Qingkun Su, Long Qin, Weizhi…
Compress3D: a Compressed Latent Space for 3D Generation from a Single Imageby Bowen Zhang, Tianyu…
Masked Generative Story Transformer with Character Guidance and Caption Augmentationby Christos Papadimitriou, Giorgos Filandrianos, Maria…
PrimeComposer: Faster Progressively Combined Diffusion for Image Composition with Attention Steeringby Yibin Wang, Weizhong Zhang,…