Summary of Grounding Large Language Models in Embodied Environment with Imperfect World Models, by Haolan Liu et al.
Grounding Large Language Models In Embodied Environment With Imperfect World Modelsby Haolan Liu, Jishen ZhaoFirst…
Grounding Large Language Models In Embodied Environment With Imperfect World Modelsby Haolan Liu, Jishen ZhaoFirst…
CulturalBench: a Robust, Diverse and Challenging Benchmark on Measuring the (Lack of) Cultural Knowledge of…
CodeJudge: Evaluating Code Generation with Large Language Modelsby Weixi Tong, Tianyi ZhangFirst submitted to arxiv…
Can LLMs Reliably Simulate Human Learner Actions? A Simulation Authoring Framework for Open-Ended Learning Environmentsby…
MARPLE: A Benchmark for Long-Horizon Inferenceby Emily Jin, Zhuoyi Huang, Jan-Philipp Fränken, Weiyu Liu, Hannah…
Automated Red Teaming with GOAT: the Generative Offensive Agent Testerby Maya Pavlova, Erik Brinkman, Krithika…
The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMsby Hong Li, Nanxi Li,…
Sparse Attention Decomposition Applied to Circuit Tracingby Gabriel Franco, Mark CrovellaFirst submitted to arxiv on:…
Evaluating the fairness of task-adaptive pretraining on unlabeled test data before few-shot text classificationby Kush…
Can Models Learn Skill Composition from Examples?by Haoyu Zhao, Simran Kaur, Dingli Yu, Anirudh Goyal,…