Summary of Evaluating the Efficacy Of Large Language Models in Detecting Fake News: a Comparative Analysis, by Sahas Koka et al.
Evaluating the Efficacy of Large Language Models in Detecting Fake News: A Comparative Analysisby Sahas…
Evaluating the Efficacy of Large Language Models in Detecting Fake News: A Comparative Analysisby Sahas…
GameBench: Evaluating Strategic Reasoning Abilities of LLM Agentsby Anthony Costarelli, Mat Allen, Roman Hauksson, Grace…
Can Language Models Serve as Text-Based World Simulators?by Ruoyao Wang, Graham Todd, Ziang Xiao, Xingdi…
ThatiAR: Subjectivity Detection in Arabic News Sentencesby Reem Suwaileh, Maram Hasanain, Fatema Hubail, Wajdi Zaghouani,…
EmbSpatial-Bench: Benchmarking Spatial Understanding for Embodied Tasks with Large Vision-Language Modelsby Mengfei Du, Binhao Wu,…
Toward Reliable Ad-hoc Scientific Information Extraction: A Case Study on Two Materials Datasetsby Satanu Ghosh,…
Multi-attribute Auction-based Resource Allocation for Twins Migration in Vehicular Metaverses: A GPT-based DRL Approachby Yongju…
NATURAL PLAN: Benchmarking LLMs on Natural Language Planningby Huaixiu Steven Zheng, Swaroop Mishra, Hugh Zhang,…
WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wildby Bill Yuchen Lin,…
Exploring the Latest LLMs for Leaderboard Extractionby Salomon Kabongo, Jennifer D'Souza, Sören AuerFirst submitted to…