Summary of Strong Preferences Affect the Robustness Of Preference Models and Value Alignment, by Ziwei Xu et al.
Strong Preferences Affect the Robustness of Preference Models and Value Alignmentby Ziwei Xu, Mohan KankanhalliFirst…
Strong Preferences Affect the Robustness of Preference Models and Value Alignmentby Ziwei Xu, Mohan KankanhalliFirst…
A LLM-Powered Automatic Grading Framework with Human-Level Guidelines Optimizationby Yucheng Chu, Hang Li, Kaiqi Yang,…
Aligning with Logic: Measuring, Evaluating and Improving Logical Preference Consistency in Large Language Modelsby Yinhong…
FactAlign: Long-form Factuality Alignment of Large Language Modelsby Chao-Wei Huang, Yun-Nung ChenFirst submitted to arxiv…
Agent-Driven Large Language Models for Mandarin Lyric Generationby Hong-Hsiang Liu, Yi-Wen LiuFirst submitted to arxiv…
Seeing Eye to AI: Human Alignment via Gaze-Based Response Rewards for Large Language Modelsby Angela…
Truth or Deceit? A Bayesian Decoding Game Enhances Consistency and Reliabilityby Weitong Zhang, Chengqi Zang,…
Towards Inference-time Category-wise Safety Steering for Large Language Modelsby Amrita Bhattacharjee, Shaona Ghosh, Traian Rebedea,…
FlipGuard: Defending Preference Alignment against Update Regression with Constrained Optimizationby Mingye Zhu, Yi Liu, Quan…
Cross-lingual Back-Parsing: Utterance Synthesis from Meaning Representation for Zero-Resource Semantic Parsingby Deokhyung Kang, Seonjeong Hwang,…