Summary of Group Robust Preference Optimization in Reward-free Rlhf, by Shyam Sundhar Ramesh et al.
Group Robust Preference Optimization in Reward-free RLHFby Shyam Sundhar Ramesh, Yifan Hu, Iason Chaimalas, Viraj…
Group Robust Preference Optimization in Reward-free RLHFby Shyam Sundhar Ramesh, Yifan Hu, Iason Chaimalas, Viraj…
Estimating before Debiasing: A Bayesian Approach to Detaching Prior Bias in Federated Semi-Supervised Learningby Guogang…
Reconciling Model Multiplicity for Downstream Decision Makingby Ally Yalei Du, Dung Daniel Ngo, Zhiwei Steven…
Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Modelsby Zhanhui Zhou, Zhixuan…
Generalized Neyman Allocation for Locally Minimax Optimal Best-Arm Identificationby Masahiro KatoFirst submitted to arxiv on:…
Inference-Time Alignment of Diffusion Models with Direct Noise Optimizationby Zhiwei Tang, Jiangweizhi Peng, Jiasheng Tang,…
Quantitative Certification of Bias in Large Language Modelsby Isha Chaudhary, Qian Hu, Manoj Kumar, Morteza…
Fast Explanations via Policy Gradient-Optimized Explainerby Deng Pan, Nuno Moniz, Nitesh ChawlaFirst submitted to arxiv…
Probabilistically Plausible Counterfactual Explanations with Normalizing Flowsby Patryk Wielopolski, Oleksii Furman, Jerzy Stefanowski, Maciej ZiÄ™baFirst…
A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Trainingby…