Summary of Vega: Learning Interleaved Image-text Comprehension in Vision-language Large Models, by Chenyu Zhou et al.
VEGA: Learning Interleaved Image-Text Comprehension in Vision-Language Large Modelsby Chenyu Zhou, Mengdan Zhang, Peixian Chen,…
VEGA: Learning Interleaved Image-Text Comprehension in Vision-Language Large Modelsby Chenyu Zhou, Mengdan Zhang, Peixian Chen,…
Object criticality for safer navigationby Andrea Ceccarelli, Leonardo MontecchiFirst submitted to arxiv on: 25 Apr…
QCQA: Quality and Capacity-aware grouped Query Attentionby Vinay Joshi, Prashant Laddha, Shambhavi Sinha, Om Ji…
Analyzing Gender Polarity in Short Social Media Texts with BERT: The Role of Emojis and…
Speech ReaLLM – Real-time Streaming Speech Recognition with Multimodal LLMs by Teaching the Flow of…
Multi-Modal Retrieval For Large Language Model Based Speech Recognitionby Jari Kolehmainen, Aditya Gourav, Prashanth Gurunath…
DSL-FIQA: Assessing Facial Image Quality via Dual-Set Degradation Learning and Landmark-Guided Transformerby Wei-Ting Chen, Gurunandan…
RobustSAM: Segment Anything Robustly on Degraded Imagesby Wei-Ting Chen, Yu-Jiet Vong, Sy-Yen Kuo, Sizhuo Ma,…
Learning Language Structures through Groundingby Freda ShiFirst submitted to arxiv on: 14 Jun 2024CategoriesMain: Computation…
A Survey of Video Datasets for Grounded Event Understandingby Kate Sanders, Benjamin Van DurmeFirst submitted…