Summary of Discrete Probabilistic Inference As Control in Multi-path Environments, by Tristan Deleu et al.

Discrete Probabilistic Inference as Control in Multi-path Environments

by Tristan Deleu, Padideh Nouri, Nikolay Malkin, Doina Precup, Yoshua Bengio

First submitted to arxiv on: 15 Feb 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The abstract presents a machine learning problem: sampling from a structured distribution as a sequential decision process. The goal is to find a policy that samples objects proportionally to their reward, while addressing biases in the optimal policy. Maximum entropy Reinforcement Learning (MaxEnt RL) and Generative Flow Networks (GFlowNets) are used to solve this problem. The paper extends recent methods to correct the reward, ensuring the marginal distribution induced by the optimal policy is proportional to the original reward. It also proves that some flow-matching objectives are equivalent to well-established MaxEnt RL algorithms with a corrected reward. Empirical performance studies multiple MaxEnt RL and GFlowNet algorithms on discrete distribution sampling problems.
Low	GrooveSquid.com (original content)	Low Difficulty Summary Imagine trying to pick objects from a box, but you want to get them in proportion to how good they are. This is the problem that some computer scientists are working on. They’re using special tools called maximum entropy reinforcement learning (MaxEnt RL) and generative flow networks (GFlowNets). The goal is to make sure that the objects you pick are proportional to their value, no matter what kind of box it is. This paper talks about how they can make this work better by fixing a problem in MaxEnt RL, and shows some results using different methods.

Keywords

* Artificial intelligence * Machine learning * Reinforcement learning

Discrete Probabilistic Inference as Control in Multi-path Environments

by Tristan Deleu, Padideh Nouri, Nikolay Malkin, Doina Precup, Yoshua Bengio

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Information Capacity Regret Bounds For Bandits with Mediator Feedback, by Khaled Eldowa et al.

Summary of Polyhedral Complex Derivation From Piecewise Trilinear Networks, by Jin-hwa Kim

Related Posts