Loading Now

Summary of Efficient and Sharp Off-policy Evaluation in Robust Markov Decision Processes, by Andrew Bennett et al.


Efficient and Sharp Off-Policy Evaluation in Robust Markov Decision Processes

by Andrew Bennett, Nathan Kallus, Miruna Oprescu, Wen Sun, Kaiwen Wang

First submitted to arxiv on: 29 Mar 2024

Categories

  • Main: Artificial Intelligence (cs.AI)
  • Secondary: Machine Learning (stat.ML)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
We examine the problem of evaluating a policy in a Markov decision process (MDP) under best- and worst-case perturbations, using transition observations from the original MDP. This is crucial when there’s a possibility of environmental shifts between historical and future scenarios due to factors like unmeasured confounding or distributional shift. We propose a perturbation model that allows changes in transition kernel densities up to a given multiplicative factor or its reciprocal, extending the classic marginal sensitivity model (MSM) for single time-step decision-making to infinite-horizon RL. Our approach provides sharp bounds on policy value under this model and we study the estimation of these bounds from transition observations. We develop an estimator with semiparametric efficiency guarantees, ensuring valid statistical inference using Wald confidence intervals. Our results are validated through numerical simulations.
Low GrooveSquid.com (original content) Low Difficulty Summary
Imagine trying to evaluate a plan for making decisions in uncertain situations. You need to consider both the best and worst-case scenarios that might happen. This paper explores how to do this by studying a special type of problem called Markov decision processes (MDPs). The goal is to come up with a way to measure how well a plan works under different circumstances. They propose a new method that takes into account possible changes in the environment and can handle situations where the conditions change between training and testing. The results show that this approach provides reliable and trustworthy evaluations.

Keywords

» Artificial intelligence  » Inference