Summary of Near-optimal Regret in Linear Mdps with Aggregate Bandit Feedback, by Asaf Cassel and Haipeng Luo and Aviv Rosenberg and Dmitry Sotnikov
Near-Optimal Regret in Linear MDPs with Aggregate Bandit Feedbackby Asaf Cassel, Haipeng Luo, Aviv Rosenberg,…