Loading Now

Summary of Tree Search-based Policy Optimization Under Stochastic Execution Delay, by David Valensi et al.


Tree Search-Based Policy Optimization under Stochastic Execution Delay

by David Valensi, Esther Derman, Shie Mannor, Gal Dalal

First submitted to arxiv on: 8 Apr 2024

Categories

  • Main: Artificial Intelligence (cs.AI)
  • Secondary: Machine Learning (cs.LG)

     Abstract of paper      PDF of paper


GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty Written by Summary
High Paper authors High Difficulty Summary
Read the original abstract here
Medium GrooveSquid.com (original content) Medium Difficulty Summary
This research paper proposes a new formalism for Markov decision processes (MDPs) that addresses random delays in realistic applications such as robotics or healthcare. The standard formulation of MDPs assumes immediate execution, but the authors introduce stochastic delayed execution MDPs to account for variable delays. They show that observed delay values can be used to optimize policy search within the class of Markov policies, extending the deterministic fixed delay case. The proposed algorithm, DEZ, combines Monte-Carlo tree search with a model-based approach to handle delayed execution while preserving sample efficiency. Experimental results on the Atari suite demonstrate that DEZ outperforms baselines in both constant and stochastic delay scenarios.
Low GrooveSquid.com (original content) Low Difficulty Summary
This paper is about making decisions when there’s a wait time between actions. Imagine you’re a robot trying to pick up objects, but it takes some time to move from one spot to another. Right now, we have ways of making decisions without considering this delay, but that doesn’t always work well in real-life situations. The researchers introduce a new way of thinking about these kinds of decisions, called stochastic delayed execution Markov decision processes (MDPs). They show how to make better decisions by taking into account the random wait times between actions. This can help with things like robots picking up objects or medical devices making diagnoses.

Keywords

» Artificial intelligence