Summary of Easyjudge: An Easy-to-use Tool For Comprehensive Response Evaluation Of Llms, by Yijie Li and Yuan Sun
EasyJudge: an Easy-to-use Tool for Comprehensive Response Evaluation of LLMsby Yijie Li, Yuan SunFirst submitted…
EasyJudge: an Easy-to-use Tool for Comprehensive Response Evaluation of LLMsby Yijie Li, Yuan SunFirst submitted…
MMAD: A Comprehensive Benchmark for Multimodal Large Language Models in Industrial Anomaly Detectionby Xi Jiang,…
Extended Japanese Commonsense Morality Dataset with Masked Token and Label Enhancementby Takumi Ohashi, Tsubasa Nakagawa,…
SimpleStrat: Diversifying Language Model Generation with Stratificationby Justin Wong, Yury Orlovskiy, Michael Luo, Sanjit A.…
Humanity in AI: Detecting the Personality of Large Language Modelsby Baohua Zhan, Yongyi Huang, Wenyao…
MRAG-Bench: Vision-Centric Evaluation for Retrieval-Augmented Multimodal Modelsby Wenbo Hu, Jia-Chen Gu, Zi-Yi Dou, Mohsen Fayyaz,…
GameTraversalBenchmark: Evaluating Planning Abilities Of Large Language Models Through Traversing 2D Game Mapsby Muhammad Umair…
Teaching-Inspired Integrated Prompting Framework: A Novel Approach for Enhancing Reasoning in Large Language Modelsby Wenting…
COMMA: A Communicative Multimodal Multi-Agent Benchmarkby Timothy Ossowski, Jixuan Chen, Danyal Maqbool, Zefan Cai, Tyler…
MentalArena: Self-play Training of Language Models for Diagnosis and Treatment of Mental Health Disordersby Cheng…