Summary of Reward Difference Optimization For Sample Reweighting in Offline Rlhf, by Shiqi Wang et al.
Reward Difference Optimization For Sample Reweighting In Offline RLHFby Shiqi Wang, Zhengze Zhang, Rui Zhao,…
Reward Difference Optimization For Sample Reweighting In Offline RLHFby Shiqi Wang, Zhengze Zhang, Rui Zhao,…
Fine-tuning LLMs for Autonomous Spacecraft Control: A Case Study Using Kerbal Space Programby Alejandro Carrasco,…
Solving a Rubik’s Cube Using its Local Graph Structureby Shunyu Yao, Mitchy LeeFirst submitted to…
Large Language Models Prompting With Episodic Memoryby Dai Do, Quan Tran, Svetha Venkatesh, Hung LeFirst…
Multi-Agent Continuous Control with Generative Flow Networksby Shuang Luo, Yinchuan Li, Shunyu Liu, Xu Zhang,…
Online Optimization of Curriculum Learning Schedules using Evolutionary Optimizationby Mohit Jiwatode, Leon Schlecht, Alexander DockhornFirst…
In-Context Exploiter for Extensive-Form Gamesby Shuxin Li, Chang Yang, Youzhi Zhang, Pengdeng Li, Xinrun Wang,…
Strong and weak alignment of large language models with human valuesby Mehdi Khamassi, Marceau Nahon,…
KnowPC: Knowledge-Driven Programmatic Reinforcement Learning for Zero-shot Coordinationby Yin Gu, Qi Liu, Zhi Li, Kai…
PLANRL: A Motion Planning and Imitation Learning Framework to Bootstrap Reinforcement Learningby Amisha Bhaskar, Zahiruddin…