Summary of Frustratingly Easy Test-time Adaptation Of Vision-language Models, by Matteo Farina et al.
Frustratingly Easy Test-Time Adaptation of Vision-Language Modelsby Matteo Farina, Gianni Franchi, Giovanni Iacca, Massimiliano Mancini,…
Frustratingly Easy Test-Time Adaptation of Vision-Language Modelsby Matteo Farina, Gianni Franchi, Giovanni Iacca, Massimiliano Mancini,…
MM-Mixing: Multi-Modal Mixing Alignment for 3D Understandingby Jiaze Wang, Yi Wang, Ziyu Guo, Renrui Zhang,…
Learning to Detour: Shortcut Mitigating Augmentation for Weakly Supervised Semantic Segmentationby JuneHyoung Kwon, Eunju Lee,…
Text-only Synthesis for Image Captioningby Qing Zhou, Junlin Huang, Qiang Li, Junyu Gao, Qi WangFirst…
Hybrid Preference Optimization: Augmenting Direct Preference Optimization with Auxiliary Objectivesby Anirudhan Badrinath, Prabhat Agarwal, Jiajing…
TIMA: Text-Image Mutual Awareness for Balancing Zero-Shot Adversarial Robustness and Generalization Abilityby Fengji Ma, Li…
Think Before You Act: A Two-Stage Framework for Mitigating Gender Bias Towards Vision-Language Tasksby Yunqi…
Position: Foundation Agents as the Paradigm Shift for Decision Makingby Xiaoqian Liu, Xingzhou Lou, Jianbin…
Boosting Medical Image-based Cancer Detection via Text-guided Supervision from Reportsby Guangyu Guo, Jiawen Yao, Yingda…
Surgical Feature-Space Decomposition of LLMs: Why, When and How?by Arnav Chavan, Nahush Lele, Deepak GuptaFirst…