Summary of Fairer Preferences Elicit Improved Human-aligned Large Language Model Judgments, by Han Zhou et al.
Fairer Preferences Elicit Improved Human-Aligned Large Language Model Judgmentsby Han Zhou, Xingchen Wan, Yinhong Liu,…
Fairer Preferences Elicit Improved Human-Aligned Large Language Model Judgmentsby Han Zhou, Xingchen Wan, Yinhong Liu,…
P-TA: Using Proximal Policy Optimization to Enhance Tabular Data Augmentation via Large Language Modelsby Shuo…
Constrained Reinforcement Learning with Average Reward Objective: Model-Based and Model-Free Algorithmsby Vaneet Aggarwal, Washim Uddin…
How Neural Networks Learn the Support is an Implicit Regularization Effect of SGDby Pierfrancesco Beneventano,…
Active search for Bifurcationsby Yorgos M. Psarellis, Themistoklis P. Sapsis, Ioannis G. KevrekidisFirst submitted to…
Distributed Stochastic Gradient Descent with Staleness: A Stochastic Delay Differential Equation Based Frameworkby Siyuan Yu,…
Learning Iterative Reasoning through Energy Diffusionby Yilun Du, Jiayuan Mao, Joshua B. TenenbaumFirst submitted to…
DIPPER: Direct Preference Optimization to Accelerate Primitive-Enabled Hierarchical Reinforcement Learningby Utsav Singh, Souradip Chakraborty, Wesley…
Bayesian Intervention Optimization for Causal Discoveryby Yuxuan Wang, Mingzhou Liu, Xinwei Sun, Wei Wang, Yizhou…
UniZero: Generalized and Efficient Planning with Scalable Latent World Modelsby Yuan Pu, Yazhe Niu, Zhenjie…