Summary of Lumos : Empowering Multimodal Llms with Scene Text Recognition, by Ashish Shenoy et al.
Lumos : Empowering Multimodal LLMs with Scene Text Recognitionby Ashish Shenoy, Yichao Lu, Srihari Jayakumar,…
Lumos : Empowering Multimodal LLMs with Scene Text Recognitionby Ashish Shenoy, Yichao Lu, Srihari Jayakumar,…
Retrieval Augmented Thought Process for Private Data Handling in Healthcareby Thomas Pouplin, Hao Sun, Samuel…
G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answeringby Xiaoxin He, Yijun Tian, Yifei…
Exploring Perceptual Limitation of Multimodal Large Language Modelsby Jiarui Zhang, Jinyi Hu, Mahyar Khayatkhoei, Filip…
Open-ended VQA benchmarking of Vision-Language models by exploiting Classification datasets and their semantic hierarchyby Simon…
Gemini Goes to Med School: Exploring the Capabilities of Multimodal Large Language Models on Medical…
SubGen: Token Generation in Sublinear Time and Memoryby Amir Zandieh, Insu Han, Vahab Mirrokni, Amin…
Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Surveyby Zhuo Chen, Yichi Zhang, Yin Fang, Yuxia…
VerAs: Verify then Assess STEM Lab Reportsby Berk Atil, Mahsa Sheikhi Karizaki, Rebecca J. PassonneauFirst…
Text-Guided Image Clusteringby Andreas Stephan, Lukas Miklautz, Kevin Sidak, Jan Philip Wahle, Bela Gipp, Claudia…