Summary of B-cosification: Transforming Deep Neural Networks to Be Inherently Interpretable, by Shreyash Arya et al.
B-cosification: Transforming Deep Neural Networks to be Inherently Interpretableby Shreyash Arya, Sukrut Rao, Moritz Böhle,…
B-cosification: Transforming Deep Neural Networks to be Inherently Interpretableby Shreyash Arya, Sukrut Rao, Moritz Böhle,…
Active Preference-based Learning for Multi-dimensional Personalizationby Minhyeon Oh, Seungjoon Lee, Jungseul OkFirst submitted to arxiv…
Adapting Language Models via Token Translationby Zhili Feng, Tanya Marwah, Nicolo Fusi, David Alvarez-Melis, Lester…
ResiDual Transformer Alignment with Spectral Decompositionby Lorenzo Basile, Valentino Maiorca, Luca Bortolussi, Emanuele Rodolà, Francesco…
Enhancing Diversity in Bayesian Deep Learning via Hyperspherical Energy Minimization of CKAby David Smerkous, Qinxun…
SelfCodeAlign: Self-Alignment for Code Generationby Yuxiang Wei, Federico Cassano, Jiawei Liu, Yifeng Ding, Naman Jain,…
Dynamical similarity analysis can identify compositional dynamics developing in RNNsby Quentin Guilhot, Michał Wójcik, Jascha…
Representative Social Choice: From Learning Theory to AI Alignmentby Tianyi QiuFirst submitted to arxiv on:…
Adaptive Alignment: Dynamic Preference Adjustments via Multi-Objective Reinforcement Learning for Pluralistic AIby Hadassah Harland, Richard…
Rethinking Inverse Reinforcement Learning: from Data Alignment to Task Alignmentby Weichao Zhou, Wenchao LiFirst submitted…