Summary of Enhancing Llm Safety Via Constrained Direct Preference Optimization, by Zixuan Liu et al.
Enhancing LLM Safety via Constrained Direct Preference Optimizationby Zixuan Liu, Xiaolin Sun, Zizhan ZhengFirst submitted…
Enhancing LLM Safety via Constrained Direct Preference Optimizationby Zixuan Liu, Xiaolin Sun, Zizhan ZhengFirst submitted…
Koopman-Assisted Reinforcement Learningby Preston Rozwood, Edward Mehrez, Ludger Paehler, Wen Sun, Steven L. BruntonFirst submitted…
Towards Provable Log Density Policy Gradientby Pulkit Katdare, Anant Joshi, Katherine Driggs-CampbellFirst submitted to arxiv…
Sample Efficient Myopic Exploration Through Multitask Reinforcement Learning with Diverse Tasksby Ziping Xu, Zifan Xu,…
On the Role of Information Structure in Reinforcement Learning for Partially-Observable Sequential Teams and Gamesby…
Continuous Mean-Zero Disagreement-Regularized Imitation Learning (CMZ-DRIL)by Noah Ford, Ryan W. Gardner, Austin Juhl, Nathan LarsonFirst…
Efficient Episodic Memory Utilization of Cooperative Multi-Agent Reinforcement Learningby Hyungho Na, Yunkyeong Seo, Il-chul MoonFirst…
Direct Alignment of Draft Model for Speculative Decoding with Chat-Fine-Tuned LLMsby Raghavv Goel, Mukul Gagrani,…
EfficientZero V2: Mastering Discrete and Continuous Control with Limited Databy Shengjie Wang, Shaohuai Liu, Weirui…
Reusing Historical Trajectories in Natural Policy Gradient via Importance Sampling: Convergence and Convergence Rateby Yifan…