Summary of Judging the Judges: Evaluating Alignment and Vulnerabilities in Llms-as-judges, by Aman Singh Thakur et al.
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judgesby Aman Singh Thakur, Kartik Choudhary, Venkat…
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judgesby Aman Singh Thakur, Kartik Choudhary, Venkat…
An Investigation of Neuron Activation as a Unified Lens to Explain Chain-of-Thought Eliciting Arithmetic Reasoning…
PDSS: A Privacy-Preserving Framework for Step-by-Step Distillation of Large Language Modelsby Tao Fan, Yan Kang,…
Grade Score: Quantifying LLM Performance in Option Selectionby Dmitri IourovitskiFirst submitted to arxiv on: 17…
Prompt Design Matters for Computational Social Science Tasks but in Unpredictable Waysby Shubham Atreja, Joshua…
How Far Can In-Context Alignment Go? Exploring the State of In-Context Alignmentby Heyan Huang, Yinghao…
FamiCom: Further Demystifying Prompts for Language Models with Task-Agnostic Performance Estimationby Bangzheng Li, Ben Zhou,…
Demonstration Notebook: Finding the Most Suited In-Context Learning Example from Interactionsby Yiming Tang, Bin DongFirst…
HiddenTables & PyQTax: A Cooperative Game and Dataset For TableQA to Ensure Scale and Data…
Efficient Prompting for LLM-based Generative Internet of Thingsby Bin Xiao, Burak Kantarci, Jiawen Kang, Dusit…