Summary of Mini-gemini: Mining the Potential Of Multi-modality Vision Language Models, by Yanwei Li et al.
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Modelsby Yanwei Li, Yuechen Zhang, Chengyao Wang,…
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Modelsby Yanwei Li, Yuechen Zhang, Chengyao Wang,…
Evaluating the Efficacy of Prompt-Engineered Large Multimodal Models Versus Fine-Tuned Vision Transformers in Image-Based Security…
How Far Are We on the Decision-Making of LLMs? Evaluating LLMs’ Gaming Ability in Multi-Agent…
How Well Do Multi-modal LLMs Interpret CT Scans? An Auto-Evaluation Framework for Analysesby Qingqing Zhu,…
Can Large Language Models do Analytical Reasoning?by Yebowen Hu, Kaiqiang Song, Sangwoo Cho, Xiaoyang Wang,…
Are Language Models Puzzle Prodigies? Algorithmic Puzzles Unveil Serious Challenges in Multimodal Reasoningby Deepanway Ghosal,…
LLMs in Political Science: Heralding a New Era of Visual Analysisby Yu WangFirst submitted to…
GAOKAO-MM: A Chinese Human-Level Benchmark for Multimodal Models Evaluationby Yi Zong, Xipeng QiuFirst submitted to…