Claude – Page 12 – GrooveSquid.com

July 13, 2025

Foundational Autoraters: Taming Large Language Models for Better Automatic Evaluationby Tu Vu, Kalpesh Krishna, Salaheddin…

July 13, 2025

MetaLLM: A High-performant and Cost-efficient Dynamic Framework for Wrapping LLMsby Quang H. Nguyen, Duy C.…

July 13, 2025

Cutting Through the Clutter: The Potential of LLMs for Efficient Filtration in Systematic Literature Reviewsby…

July 13, 2025

Unveiling Disparities in Maternity Care: A Topic Modelling Approach to Analysing Maternity Incident Investigation Reportsby…

July 13, 2025

MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?by Zhaorun Chen,…

July 13, 2025

Jailbreaking LLMs with Arabic Transliteration and Arabiziby Mansour Al Ghanim, Saleh Almohaimeed, Mengxin Zheng, Yan…

July 13, 2025

APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasetsby Zuxin Liu, Thai Hoang, Jianguo…

July 13, 2025

Large Language Models Assume People are More Rational than We Really areby Ryan Liu, Jiayi…

July 13, 2025

AutoDetect: Towards a Unified Framework for Automated Weakness Detection in Large Language Modelsby Jiale Cheng,…

July 13, 2025

GraphEval36K: Benchmarking Coding and Reasoning Capabilities of Large Language Models on Graph Datasetsby Qiming Wu,…