Summary of Muirbench: a Comprehensive Benchmark For Robust Multi-image Understanding, by Fei Wang et al.
MuirBench: A Comprehensive Benchmark for Robust Multi-image Understandingby Fei Wang, Xingyu Fu, James Y. Huang,…
MuirBench: A Comprehensive Benchmark for Robust Multi-image Understandingby Fei Wang, Xingyu Fu, James Y. Huang,…
GPT-ology, Computational Models, Silicon Sampling: How should we think about LLMs in Cognitive Science?by Desmond…
A Sociotechnical Lens for Evaluating Computer Vision Models: A Case Study on Detecting and Reasoning…
Tailoring Generative AI Chatbots for Multiethnic Communities in Disaster Preparedness Communication: Extending the CASA Paradigmby…
Language Model Council: Democratically Benchmarking Foundation Models on Highly Subjective Tasksby Justin Zhao, Flor Miriam…
Open-LLM-Leaderboard: From Multi-choice to Open-style Questions for LLMs Evaluation, Benchmark, and Arenaby Aidar Myrzakhan, Sondos…
Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense?by Xingyu Fu, Muyu He, Yujie Lu, William…
MLLMGuard: A Multi-dimensional Safety Evaluation Suite for Multimodal Large Language Modelsby Tianle Gu, Zeyang Zhou,…
Are Large Language Models Good Statisticians?by Yizhang Zhu, Shiyin Du, Boyan Li, Yuyu Luo, Nan…
T2S-GPT: Dynamic Vector Quantization for Autoregressive Sign Language Production from Textby Aoxiong Yin, Haoyuan Li,…