Summary of Foundational Autoraters: Taming Large Language Models For Better Automatic Evaluation, by Tu Vu et al.
Foundational Autoraters: Taming Large Language Models for Better Automatic Evaluationby Tu Vu, Kalpesh Krishna, Salaheddin…
Foundational Autoraters: Taming Large Language Models for Better Automatic Evaluationby Tu Vu, Kalpesh Krishna, Salaheddin…
MetaLLM: A High-performant and Cost-efficient Dynamic Framework for Wrapping LLMsby Quang H. Nguyen, Duy C.…
Cutting Through the Clutter: The Potential of LLMs for Efficient Filtration in Systematic Literature Reviewsby…
Unveiling Disparities in Maternity Care: A Topic Modelling Approach to Analysing Maternity Incident Investigation Reportsby…
MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?by Zhaorun Chen,…
Jailbreaking LLMs with Arabic Transliteration and Arabiziby Mansour Al Ghanim, Saleh Almohaimeed, Mengxin Zheng, Yan…
APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasetsby Zuxin Liu, Thai Hoang, Jianguo…
Large Language Models Assume People are More Rational than We Really areby Ryan Liu, Jiayi…
AutoDetect: Towards a Unified Framework for Automated Weakness Detection in Large Language Modelsby Jiale Cheng,…
GraphEval36K: Benchmarking Coding and Reasoning Capabilities of Large Language Models on Graph Datasetsby Qiming Wu,…