Summary of Judging the Judges: Evaluating Alignment and Vulnerabilities in Llms-as-judges, by Aman Singh Thakur et al.
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judgesby Aman Singh Thakur, Kartik Choudhary, Venkat…
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judgesby Aman Singh Thakur, Kartik Choudhary, Venkat…
Aqulia-Med LLM: Pioneering Full-Process Open-Source Medical Language Modelsby Lulu Zhao, Weihao Zeng, Xiaofeng Shi, Hua…
SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Modelby Yongting Zhang, Lu Chen,…
Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalizationby Wenkai Yang, Shiqi Shen, Guangyao…
How Far Can In-Context Alignment Go? Exploring the State of In-Context Alignmentby Heyan Huang, Yinghao…
Open-Vocabulary X-ray Prohibited Item Detection via Fine-tuning CLIPby Shuyang Lin, Tong Jia, Hao Wang, Bowen…
Emerging Safety Attack and Defense in Federated Instruction Tuning of Large Language Modelsby Rui Ye,…
SememeLM: A Sememe Knowledge Enhanced Method for Long-tail Relation Representationby Shuyi Li, Shaojuan Wu, Xiaowang…
Knowledge Editing in Language Models via Adapted Direct Preference Optimizationby Amit Rozner, Barak Battash, Lior…
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothingby Zhangchen Xu, Fengqing…