Summary of Efficient Multi-agent Reinforcement Learning by Planning, By Qihan Liu et al.
Efficient Multi-agent Reinforcement Learning by Planning
by Qihan Liu, Jianing Ye, Xiaoteng Ma, Jun Yang, Bin Liang, Chongjie Zhang
First submitted to arxiv on: 20 May 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The paper proposes MAZero, a model-based multi-agent reinforcement learning (MARL) algorithm that combines centralized models with Monte Carlo Tree Search (MCTS) for policy search. The authors aim to improve the sample efficiency of MARL by leveraging the nearly-independent property of agents. MAZero is designed to facilitate distributed execution and parameter sharing through a novel network structure. The algorithm incorporates two novel techniques: Optimistic Search Lambda (OS(λ)) and Advantage-Weighted Policy Optimization (AWPO). Experimental results on the SMAC benchmark show that MAZero outperforms model-free approaches in terms of sample efficiency and achieves comparable or better performance than existing model-based methods. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary MAZero is a new way to make decisions when many agents are involved. The idea is to use models that predict what will happen if we take certain actions, combined with a search algorithm to find the best action. This helps the agents learn faster and be more efficient. The authors also came up with two new ideas: Optimistic Search Lambda and Advantage-Weighted Policy Optimization. These help the search process by giving priority to good options. They tested MAZero on a benchmark dataset and found that it outperformed other methods in terms of learning efficiency. |
Keywords
» Artificial intelligence » Optimization » Reinforcement learning