Summary of Spectral-risk Safe Reinforcement Learning with Convergence Guarantees, by Dohyeong Kim et al.
Spectral-Risk Safe Reinforcement Learning with Convergence Guaranteesby Dohyeong Kim, Taehyun Cho, Seungyub Han, Hojun Chung,…
Spectral-Risk Safe Reinforcement Learning with Convergence Guaranteesby Dohyeong Kim, Taehyun Cho, Seungyub Han, Hojun Chung,…
Preferred-Action-Optimized Diffusion Policies for Offline Reinforcement Learningby Tianle Zhang, Jiayi Guan, Lin Zhao, Yihang Li,…
Kernel Metric Learning for In-Sample Off-Policy Evaluation of Deterministic RL Policiesby Haanvid Lee, Tri Wahyu…
Policy Zooming: Adaptive Discretization-based Infinite-Horizon Average-Reward Reinforcement Learningby Avik Kar, Rahul SinghFirst submitted to arxiv…
Federated Q-Learning with Reference-Advantage Decomposition: Almost Optimal Regret and Logarithmic Communication Costby Zhong Zheng, Haochen…
DTR-Bench: An in silico Environment and Benchmark Platform for Reinforcement Learning Based Dynamic Treatment Regimeby…
Efficient Preference-based Reinforcement Learning via Aligned Experience Estimationby Fengshuo Bai, Rui Zhao, Hongming Zhang, Sijia…
Offline-Boosted Actor-Critic: Adaptively Blending Optimal Historical Behaviors in Deep Off-Policy RLby Yu Luo, Tianying Ji,…
Learning diverse attacks on large language models for robust red-teaming and safety tuningby Seanie Lee,…
Reinforcement Learning in Dynamic Treatment Regimes Needs Critical Reexaminationby Zhiyao Luo, Yangchen Pan, Peter Watkinson,…