Summary of Corruption Robust Offline Reinforcement Learning with Human Feedback, by Debmalya Mandal et al.
Corruption Robust Offline Reinforcement Learning with Human Feedbackby Debmalya Mandal, Andi Nika, Parameswaran Kamalaruban, Adish…
Corruption Robust Offline Reinforcement Learning with Human Feedbackby Debmalya Mandal, Andi Nika, Parameswaran Kamalaruban, Adish…
Monitored Markov Decision Processesby Simone Parisi, Montaser Mohammedalamen, Alireza Kazemipour, Matthew E. Taylor, Michael BowlingFirst…
Principled Penalty-based Methods for Bilevel Reinforcement Learning and RLHFby Han Shen, Zhuoran Yang, Tianyi ChenFirst…
Predictive representations: building blocks of intelligenceby Wilka Carvalho, Momchil S. Tomov, William de Cothi, Caswell…
Scaling Intelligent Agents in Combat Simulations for Wargamingby Scotty Black, Christian DarkenFirst submitted to arxiv…
Entropy-Regularized Token-Level Policy Optimization for Language Agent Reinforcementby Muning Wen, Junwei Liao, Cheng Deng, Jun…
ACTER: Diverse and Actionable Counterfactual Sequences for Explaining and Diagnosing RL Policiesby Jasmina Gajcin, Ivana…
Deceptive Path Planning via Reinforcement Learning with Graph Neural Networksby Michael Y. Fatemi, Wesley A.…
High-Precision Geosteering via Reinforcement Learning and Particle Filtersby Ressi Bonti Muhammad, Apoorv Srivastava, Sergey Alyaev,…
Hierarchical Transformers are Efficient Meta-Reinforcement Learnersby Gresa Shala, AndrĂ© Biedenkapp, Josif GrabockaFirst submitted to arxiv…