Summary of Qpo: Query-dependent Prompt Optimization Via Multi-loop Offline Reinforcement Learning, by Yilun Kong et al.
QPO: Query-dependent Prompt Optimization via Multi-Loop Offline Reinforcement Learningby Yilun Kong, Hangyu Mao, Qi Zhao,…
QPO: Query-dependent Prompt Optimization via Multi-Loop Offline Reinforcement Learningby Yilun Kong, Hangyu Mao, Qi Zhao,…
Hologram Reasoning for Solving Algebra Problems with Geometry Diagramsby Litian Huang, Xinguo Yu, Feng Xiong,…
Strategist: Learning Strategic Skills by LLMs via Bi-Level Tree Searchby Jonathan Light, Min Cai, Weiqin…
MalLight: Influence-Aware Coordinated Traffic Signal Control for Traffic Signal Malfunctionsby Qinchen Yang, Zejun Xie, Hua…
Reset-free Reinforcement Learning with World Modelsby Zhao Yang, Thomas M. Moerland, Mike Preuss, Aske Plaat,…
Minor DPO reject penalty to increase training robustnessby Shiming Xie, Hong Chen, Fred Yu, Zeye…
Demystifying Reinforcement Learning in Production Scheduling via Explainable AIby Daniel Fischer, Hannah M. Hüsener, Felix…
REFINE-LM: Mitigating Language Model Stereotypes via Reinforcement Learningby Rameez Qureshi, Naïm Es-Sebbani, Luis Galárraga, Yvette…
SynTraC: A Synthetic Dataset for Traffic Signal Control from Traffic Monitoring Camerasby Tiejin Chen, Prithvi…
Multi-Agent Reinforcement Learning for Autonomous Driving: A Surveyby Ruiqi Zhang, Jing Hou, Florian Walter, Shangding…