Summary of Evaluating and Optimizing Educational Content with Large Language Model Judgments, by Joy He-yueya et al.
Evaluating and Optimizing Educational Content with Large Language Model Judgmentsby Joy He-Yueya, Noah D. Goodman,…
Evaluating and Optimizing Educational Content with Large Language Model Judgmentsby Joy He-Yueya, Noah D. Goodman,…
VTG-GPT: Tuning-Free Zero-Shot Video Temporal Grounding with GPTby Yifang Xu, Yunzhuo Sun, Zien Xie, Benxiang…
NewsBench: A Systematic Evaluation Framework for Assessing Editorial Capabilities of Large Language Models in Chinese…
SoftTiger: A Clinical Foundation Model for Healthcare Workflowsby Ye Chen, Igor Couto, Wei Cai, Cong…
Executing Natural Language-Described Algorithms with Large Language Models: An Investigationby Xin Zheng, Qiming Zhu, Hongyu…
Loose LIPS Sink Ships: Asking Questions in Battleship with Language-Informed Program Samplingby Gabriel Grand, Valerio…
Surveying the Dead Minds: Historical-Psychological Text Analysis with Contextualized Construct Representation (CCR) for Classical Chineseby…
Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Databy…
OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Webby…
Researchy Questions: A Dataset of Multi-Perspective, Decompositional Questions for LLM Web Agentsby Corby Rosset, Ho-Lam…