Reinforcement learning – Page 213 – GrooveSquid.com

Loading Now

July 13, 2025

Summary of Reset & Distill: a Recipe For Overcoming Negative Transfer in Continual Reinforcement Learning, by Hongjoon Ahn et al.

Reset & Distill: A Recipe for Overcoming Negative Transfer in Continual Reinforcement Learningby Hongjoon Ahn,…

July 13, 2025

Summary of Simulating Battery-powered Tinyml Systems Optimised Using Reinforcement Learning in Image-based Anomaly Detection, by Jared M. Ping and Ken J. Nixon

Simulating Battery-Powered TinyML Systems Optimised using Reinforcement Learning in Image-Based Anomaly Detectionby Jared M. Ping,…

July 13, 2025

Summary of Provable Multi-party Reinforcement Learning with Diverse Human Feedback, by Huiying Zhong et al.

Provable Multi-Party Reinforcement Learning with Diverse Human Feedbackby Huiying Zhong, Zhun Deng, Weijie J. Su,…

July 13, 2025

Summary of Improved Algorithm For Adversarial Linear Mixture Mdps with Bandit Feedback and Unknown Transition, by Long-fei Li et al.

Improved Algorithm for Adversarial Linear Mixture MDPs with Bandit Feedback and Unknown Transitionby Long-Fei Li,…

July 13, 2025

Summary of Teaching Large Language Models to Reason with Reinforcement Learning, by Alex Havrilla et al.

Teaching Large Language Models to Reason with Reinforcement Learningby Alex Havrilla, Yuqing Du, Sharath Chandra…

July 13, 2025

Summary of Mastering Memory Tasks with World Models, by Mohammad Reza Samsami and Artem Zholus and Janarthanan Rajendran and Sarath Chandar

Mastering Memory Tasks with World Modelsby Mohammad Reza Samsami, Artem Zholus, Janarthanan Rajendran, Sarath ChandarFirst…

July 13, 2025

Summary of Proxy-rlhf: Decoupling Generation and Alignment in Large Language Model with Proxy, by Yu Zhu et al.

Proxy-RLHF: Decoupling Generation and Alignment in Large Language Model with Proxyby Yu Zhu, Chuxiong Sun,…

July 13, 2025

Summary of Efficient Off-policy Learning For High-dimensional Action Spaces, by Fabian Otto et al.

Efficient Off-Policy Learning for High-Dimensional Action Spacesby Fabian Otto, Philipp Becker, Ngo Anh Vien, Gerhard…

July 13, 2025

Summary of Belief-enriched Pessimistic Q-learning Against Adversarial State Perturbations, by Xiaolin Sun et al.

Belief-Enriched Pessimistic Q-Learning against Adversarial State Perturbationsby Xiaolin Sun, Zizhan ZhengFirst submitted to arxiv on:…

July 13, 2025

Summary of Stabilizing Policy Gradients For Stochastic Differential Equations Via Consistency with Perturbation Process, by Xiangxin Zhou et al.

Stabilizing Policy Gradients for Stochastic Differential Equations via Consistency with Perturbation Processby Xiangxin Zhou, Liang…