Summary of Pace: Parsimonious Concept Engineering For Large Language Models, by Jinqi Luo et al.
PaCE: Parsimonious Concept Engineering for Large Language Modelsby Jinqi Luo, Tianjiao Ding, Kwan Ho Ryan…
PaCE: Parsimonious Concept Engineering for Large Language Modelsby Jinqi Luo, Tianjiao Ding, Kwan Ho Ryan…
Spread Preference Annotation: Direct Preference Judgment for Efficient LLM Alignmentby Dongyoung Kim, Kimin Lee, Jinwoo…
Self-Play with Adversarial Critic: Provable and Scalable Offline Alignment for Language Modelsby Xiang Ji, Sanjeev…
Representational Alignment Supports Effective Machine Teachingby Ilia Sucholutsky, Katherine M. Collins, Maya Malaviya, Nori Jacoby,…
HackAtari: Atari Learning Environments for Robust and Continual Reinforcement Learningby Quentin Delfosse, Jannis Blüml, Bjarne…
Is Free Self-Alignment Possible?by Dyah Adila, Changho Shin, Yijing Zhang, Frederic SalaFirst submitted to arxiv…
Alignment Calibration: Machine Unlearning for Contrastive Learning under Auditingby Yihan Wang, Yiwei Lu, Guojun Zhang,…
Bayesian WeakS-to-Strong from Text Classification to Generationby Ziyun Cui, Ziyang Zhang, Guangzhi Sun, Wen Wu,…
Exploring User Retrieval Integration towards Large Language Models for Cross-Domain Sequential Recommendationby Tingjia Shen, Hao…
Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithmsby Rafael Rafailov, Yaswanth Chittepu, Ryan…