Summary of Policy Mirror Descent with Lookahead, by Kimon Protopapas et al.
Policy Mirror Descent with Lookaheadby Kimon Protopapas, Anas BarakatFirst submitted to arxiv on: 21 Mar…
Policy Mirror Descent with Lookaheadby Kimon Protopapas, Anas BarakatFirst submitted to arxiv on: 21 Mar…
Carbon Footprint Reduction for Sustainable Data Centers in Real-Timeby Soumyendu Sarkar, Avisek Naug, Ricardo Luna,…
DouRN: Improving DouZero by Residual Neural Networksby Yiquan Chen, Yingchao Lyu, Di ZhangFirst submitted to…
Heuristic Algorithm-based Action Masking Reinforcement Learning (HAAM-RL) with Ensemble Inference Methodby Kyuwon Choi, Cheolkyun Rho,…
RewardBench: Evaluating Reward Models for Language Modelingby Nathan Lambert, Valentina Pyatkin, Jacob Morrison, LJ Miranda,…
Dynamic Reward Adjustment in Multi-Reward Reinforcement Learning for Counselor Reflection Generationby Do June Min, Veronica…
Towards Principled Representation Learning from Videos for Reinforcement Learningby Dipendra Misra, Akanksha Saran, Tengyang Xie,…
Fast Value Tracking for Deep Reinforcement Learningby Frank Shih, Faming LiangFirst submitted to arxiv on:…
Simple Ingredients for Offline Reinforcement Learningby Edoardo Cetin, Andrea Tirinzoni, Matteo Pirotta, Alessandro Lazaric, Yann…
Equivariant Ensembles and Regularization for Reinforcement Learning in Map-based Path Planningby Mirco Theile, Hongpeng Cao,…