Summary of Vibecheck: Discover and Quantify Qualitative Differences in Large Language Models, by Lisa Dunlap et al.
VibeCheck: Discover and Quantify Qualitative Differences in Large Language Modelsby Lisa Dunlap, Krishna Mandal, Trevor…
VibeCheck: Discover and Quantify Qualitative Differences in Large Language Modelsby Lisa Dunlap, Krishna Mandal, Trevor…
ShapefileGPT: A Multi-Agent Large Language Model Framework for Automated Shapefile Processingby Qingming Lin, Rui Hu,…
OmnixR: Evaluating Omni-modality Language Models on Reasoning across Modalitiesby Lichang Chen, Hexiang Hu, Mingda Zhang,…
On A Scale From 1 to 5: Quantifying Hallucination in Faithfulness Evaluationby Xiaonan Jing, Srinivas…
Kallini et al. (2024) do not compare impossible languages with constituency-based onesby Tim HunterFirst submitted to…
VidEgoThink: Assessing Egocentric Video Understanding Capabilities for Embodied AIby Sijie Cheng, Kechen Fang, Yangyang Yu,…
Magnifier Prompt: Tackling Multimodal Hallucination via Extremely Simple Instructionsby Yuhan Fu, Ruobing Xie, Jiazhen Liu,…
Evidence of Cognitive Deficits andDevelopmental Advances in Generative AI: A Clock Drawing Test Analysisby Isaac…
Code-Mixer Ya Nahi: Novel Approaches to Measuring Multilingual LLMs’ Code-Mixing Capabilitiesby Ayushman Gupta, Akhil Bhogal,…
In-Context Learning for Long-Context Sentiment Analysis on Infrastructure Project Opinionsby Alireza Shamshiri, Kyeong Rok Ryu,…