Summary of Feedback Efficient Online Fine-tuning Of Diffusion Models, by Masatoshi Uehara et al.
Feedback Efficient Online Fine-Tuning of Diffusion Modelsby Masatoshi Uehara, Yulai Zhao, Kevin Black, Ehsan Hajiramezanali,…
Feedback Efficient Online Fine-Tuning of Diffusion Modelsby Masatoshi Uehara, Yulai Zhao, Kevin Black, Ehsan Hajiramezanali,…
How Can LLM Guide RL? A Value-Based Approachby Shenao Zhang, Sirui Zheng, Shuqi Ke, Zhihan…
Graph Diffusion Policy Optimizationby Yijing Liu, Chao Du, Tianyu Pang, Chongxuan Li, Min Lin, Wei…
Achieving Instance-dependent Sample Complexity for Constrained Markov Decision Processby Jiashuo Jiang, Yinyu YeFirst submitted to…
How Likely Do LLMs with CoT Mimic Human Reasoning?by Guangsheng Bao, Hongbo Zhang, Cunxiang Wang,…
Scalable Volt-VAR Optimization using RLlib-IMPALA Framework: A Reinforcement Learning Approachby Alaa Selim, Yanzhu Ye, Junbo…
DynaMITE-RL: A Dynamic Model for Improved Temporal Meta-Reinforcement Learningby Anthony Liang, Guy Tennenholtz, Chih-wei Hsu,…
Reward Design for Justifiable Sequential Decision-Makingby Aleksa Sukovic, Goran RadanovicFirst submitted to arxiv on: 24…
Fair Resource Allocation in Multi-Task Learningby Hao Ban, Kaiyi JiFirst submitted to arxiv on: 23…
Uniformly Safe RL with Objective Suppression for Multi-Constraint Safety-Critical Applicationsby Zihan Zhou, Jonathan Booher, Khashayar…