Summary of Ids For Ai Systems, by Alan Chan et al.
IDs for AI Systemsby Alan Chan, Noam Kolt, Peter Wills, Usman Anwar, Christian Schroeder de…
IDs for AI Systemsby Alan Chan, Noam Kolt, Peter Wills, Usman Anwar, Christian Schroeder de…
Connecting the Dots: Evaluating Abstract Reasoning Capabilities of LLMs Using the New York Times Connections…
WildVision: Evaluating Vision-Language Models in the Wild with Human Preferencesby Yujie Lu, Dongfu Jiang, Wenhu…
Reactor Mk.1 performances: MMLU, HumanEval and BBH test resultsby TJ Dunham, Henry SyahputraFirst submitted to…
ReadCtrl: Personalizing text generation with readability-controlled instruction learningby Hieu Tran, Zonghai Yao, Lingxi Li, Hong…
Open-LLM-Leaderboard: From Multi-choice to Open-style Questions for LLMs Evaluation, Benchmark, and Arenaby Aidar Myrzakhan, Sondos…
Harnessing AI for efficient analysis of complex policy documents: a case study of Executive Order…
Evaluating the Efficacy of Large Language Models in Detecting Fake News: A Comparative Analysisby Sahas…
Tool-Planner: Task Planning with Clusters across Multiple Toolsby Yanming Liu, Xinyue Peng, Jiannan Cao, Shi…
The Battle of LLMs: A Comparative Study in Conversational QA Tasksby Aryan Rangapur, Aman RangapurFirst…