Summary of Learning Planning-based Reasoning by Trajectories Collection and Process Reward Synthesizing, By Fangkai Jiao et al.
Learning Planning-based Reasoning by Trajectories Collection and Process Reward Synthesizingby Fangkai Jiao, Chengwei Qin, Zhengyuan…