Summary of Improving Reward-conditioned Policies For Multi-armed Bandits Using Normalized Weight Functions, by Kai Xu et al.
Improving Reward-Conditioned Policies for Multi-Armed Bandits using Normalized Weight Functionsby Kai Xu, Farid Tajaddodianfar, Ben…