Summary of Foundational Challenges in Assuring Alignment and Safety Of Large Language Models, by Usman Anwar et al.
Foundational Challenges in Assuring Alignment and Safety of Large Language Modelsby Usman Anwar, Abulhair Saparov,…
Foundational Challenges in Assuring Alignment and Safety of Large Language Modelsby Usman Anwar, Abulhair Saparov,…
Learn Your Reference Model for Real Good Alignmentby Alexey Gorbatovski, Boris Shaposhnikov, Alexey Malakhov, Nikita…
MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wildby Kateryna Chumachenko, Alexandros…
MCPNet: An Interpretable Classifier via Multi-Level Concept Prototypesby Bor-Shiun Wang, Chien-Yi Wang, Wei-Chen ChiuFirst submitted…
Expectation Alignment: Handling Reward Misspecification in the Presence of Expectation Mismatchby Malek Mechergui, Sarath SreedharanFirst…
Hyperbolic Delaunay Geometric Alignmentby Aniss Aiman Medbouhi, Giovanni Luca Marchetti, Vladislav Polianskii, Alexander Kravberg, Petra…
Persistent Classification: A New Approach to Stability of Data and Adversarial Examplesby Brian Bell, Michael…
Simultaneous linear connectivity of neural networks modulo permutationby Ekansh Sharma, Devin Kwok, Tom Denton, Daniel…
Less is More for Improving Automatic Evaluation of Factual Consistencyby Tong Wang, Ninad Kulkarni, Yanjun…
AEGIS: Online Adaptive AI Content Safety Moderation with Ensemble of LLM Expertsby Shaona Ghosh, Prasoon…