Summary of Discrete Probabilistic Inference As Control in Multi-path Environments, by Tristan Deleu et al.
Discrete Probabilistic Inference as Control in Multi-path Environments
by Tristan Deleu, Padideh Nouri, Nikolay Malkin, Doina Precup, Yoshua Bengio
First submitted to arxiv on: 15 Feb 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: None
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The abstract presents a machine learning problem: sampling from a structured distribution as a sequential decision process. The goal is to find a policy that samples objects proportionally to their reward, while addressing biases in the optimal policy. Maximum entropy Reinforcement Learning (MaxEnt RL) and Generative Flow Networks (GFlowNets) are used to solve this problem. The paper extends recent methods to correct the reward, ensuring the marginal distribution induced by the optimal policy is proportional to the original reward. It also proves that some flow-matching objectives are equivalent to well-established MaxEnt RL algorithms with a corrected reward. Empirical performance studies multiple MaxEnt RL and GFlowNet algorithms on discrete distribution sampling problems. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Imagine trying to pick objects from a box, but you want to get them in proportion to how good they are. This is the problem that some computer scientists are working on. They’re using special tools called maximum entropy reinforcement learning (MaxEnt RL) and generative flow networks (GFlowNets). The goal is to make sure that the objects you pick are proportional to their value, no matter what kind of box it is. This paper talks about how they can make this work better by fixing a problem in MaxEnt RL, and shows some results using different methods. |
Keywords
* Artificial intelligence * Machine learning * Reinforcement learning