Summary of Failures in Perspective-taking Of Multimodal Ai Systems, by Bridget Leonard et al.
Failures in Perspective-taking of Multimodal AI Systemsby Bridget Leonard, Kristin Woodard, Scott O. MurrayFirst submitted…
Failures in Perspective-taking of Multimodal AI Systemsby Bridget Leonard, Kristin Woodard, Scott O. MurrayFirst submitted…
VisScience: An Extensive Benchmark for Evaluating K12 Educational Multi-modal Scientific Reasoningby Zhihuan Jiang, Zhen Yang,…
‘Since Lawyers are Males..’: Examining Implicit Gender Bias in Hindi Language Generation by LLMsby Ishika…
ShizishanGPT: An Agricultural Large Language Model Integrating Tools and Resourcesby Shuting Yang, Zehui Liu, Wolfgang…
Evaluating Image Hallucination in Text-to-Image Generation with Question-Answeringby Youngsun Lim, Hojun Choi, Hyunjung ShimFirst submitted…
MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Enginesby Dongzhi Jiang, Renrui Zhang,…
LifeGPT: Topology-Agnostic Generative Pretrained Transformer Model for Cellular Automataby Jaime A. Berkovich, Markus J. BuehlerFirst…
Autoformalization of Game Descriptions using Large Language Modelsby Agnieszka Mensfelt, Kostas Stathis, Vince TrencsenyiFirst submitted…
Small Language Models can Outperform Humans in Short Creative Writing: A Study Comparing SLMs with…
CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmarkby Zachary S.…