Summary of Offline Reinforcement Learning For Llm Multi-step Reasoning, by Huaijie Wang et al.
Offline Reinforcement Learning for LLM Multi-Step Reasoningby Huaijie Wang, Shibo Hao, Hanze Dong, Shenao Zhang,…
Offline Reinforcement Learning for LLM Multi-Step Reasoningby Huaijie Wang, Shibo Hao, Hanze Dong, Shenao Zhang,…
What Are Step-Level Reward Models Rewarding? Counterintuitive Findings from MCTS-Boosted Mathematical Reasoningby Yiran Ma, Zui…
AIR: Unifying Individual and Collective Exploration in Cooperative Multi-Agent Reinforcement Learningby Guangchong Zhou, Zeren Zhang,…
Novelty-Guided Data Reuse for Efficient and Diversified Multi-Agent Reinforcement Learningby Yangkun Chen, Kai Yang, Jian…
Generalized Back-Stepping Experience Replay in Sparse-Reward Environmentsby Guwen Lyu, Masahiro SatoFirst submitted to arxiv on:…
SORREL: Suboptimal-Demonstration-Guided Reinforcement Learning for Learning to Branchby Shengyu Feng, Yiming YangFirst submitted to arxiv…
FedRLHF: A Convergence-Guaranteed Federated Framework for Privacy-Preserving and Personalized RLHFby Flint Xiaofeng Fan, Cheston Tan,…
PCA-Featured Transformer for Jamming Detection in 5G UAV Networksby Joseanne Viana, Hamed Farkhari, Pedro Sebastiao,…
Investigating Relational State Abstraction in Collaborative MARLby Sharlin Utke, Jeremie Houssineau, Giovanni MontanaFirst submitted to…
AdaCred: Adaptive Causal Decision Transformers with Feature Creditingby Hemant Kumawat, Saibal MukhopadhyayFirst submitted to arxiv…