Image captioning – Page 9 – GrooveSquid.com

July 13, 2025

Downstream-Pretext Domain Knowledge Traceback for Active Learningby Beichen Zhang, Liang Li, Zheng-Jun Zha, Jiebo Luo,…

July 13, 2025

LookupViT: Compressing visual information to a limited number of tokensby Rajat Koner, Gagan Jain, Prateek…

July 13, 2025

CIC-BART-SSA: Controllable Image Captioning with Structured Semantic Augmentationby Kalliopi Basioti, Mohamed A. Abdelsalam, Federico Fancellu,…

July 13, 2025

MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignmentby Jihao Liu, Xin Huang, Jinliang Zheng,…

July 13, 2025

Low-Rank Similarity Mining for Multimodal Dataset Distillationby Yue Xu, Zhilin Lin, Yusong Qiu, Cewu Lu,…

July 13, 2025

How Culturally Aware are Vision-Language Models?by Olena Burda-Lassen, Aman Chadha, Shashank Goswami, Vinija JainFirst submitted…

July 13, 2025

FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusionby Zehan Wang, Ziang Zhang, Xize…

July 13, 2025

Beyond Human Vision: The Role of Large Vision Language Models in Microscope Image Analysisby Prateek…

July 13, 2025

FLoRA: Enhancing Vision-Language Models with Parameter-Efficient Federated Learningby Duy Phuong Nguyen, J. Pablo Munoz, Ali…

July 13, 2025

Bridging Vision and Language Spaces with Assignment Predictionby Jungin Park, Jiyoung Lee, Kwanghoon SohnFirst submitted…