Summary of Provably Mitigating Overoptimization in Rlhf: Your Sft Loss Is Implicitly An Adversarial Regularizer, by Zhihan Liu et al.
Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizerby Zhihan Liu,…
Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizerby Zhihan Liu,…
Diffusion-based Reinforcement Learning via Q-weighted Variational Policy Optimizationby Shutong Ding, Ke Hu, Zhenhao Zhang, Kan…
Differentiable Cluster Graph Neural Networkby Yanfei Dong, Mohammed Haroon Dupty, Lambert Deng, Zhuanghua Liu, Yong…
Evolutionary Large Language Model for Automated Feature Transformationby Nanxu Gong, Chandan K.Reddy, Wangyang Ying, Haifeng…
Negative as Positive: Enhancing Out-of-distribution Generalization for Graph Contrastive Learningby Zixu Wang, Bingbing Xu, Yige…
GeoAdaLer: Geometric Insights into Adaptive Stochastic Gradient Descent Algorithmsby Chinedu Eleh, Masuzyo Mwanza, Ekene Aguegboh,…
Continuous Temporal Domain Generalizationby Zekun Cai, Guangji Bai, Renhe Jiang, Xuan Song, Liang ZhaoFirst submitted…
Achieving Dimension-Free Communication in Federated Learning via Zeroth-Order Optimizationby Zhe Li, Bicheng Ying, Zidong Liu,…
A Systematic Bias of Machine Learning Regression Models and Its Correction: an Application to Imaging-based…
Wasserstein Distances, Neuronal Entanglement, and Sparsityby Shashata Sawmya, Linghao Kong, Ilia Markov, Dan Alistarh, Nir…