Summary of Knowpc: Knowledge-driven Programmatic Reinforcement Learning For Zero-shot Coordination, by Yin Gu et al.
KnowPC: Knowledge-Driven Programmatic Reinforcement Learning for Zero-shot Coordination
by Yin Gu, Qi Liu, Zhi Li, Kai Zhang
First submitted to arxiv on: 8 Aug 2024
Categories
- Main: Artificial Intelligence (cs.AI)
- Secondary: None
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary A deep reinforcement learning (DRL) approach combined with advanced self-play or population-based methods has been used to tackle zero-shot coordination (ZSC), where an agent learns to cooperate with an unseen partner. However, neural networks lack interpretability and logic, making it challenging for partners to understand the learned policies and limiting their generalization ability. To address this, a programmatic approach is proposed, where the agent’s policy is represented as an interpretable program containing stable logic. The Knowledge-driven Programmatic reinforcement learning for zero-shot Coordination (KnowPC) framework is introduced, which integrates an extractor and a reasoner to efficiently search through the vast program space. KnowPC first defines a Domain-Specific Language (DSL), including program structures, conditional primitives, and action primitives. The extractor discovers environmental transition knowledge from multi-agent interaction trajectories, while the reasoner deduces the preconditions of each action primitive based on the transition knowledge. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary Zero-shot coordination is a big challenge in AI that helps agents work together with new partners they’ve never seen before. Right now, we use deep learning to solve this problem, but those models are hard to understand and don’t always work well in new situations. To fix this, scientists are suggesting using programs instead of neural networks to represent an agent’s policy. These programs contain clear rules that are easy for humans or other agents to understand and follow. The KnowPC framework is a way to make this happen by combining two main parts: an extractor that finds patterns in how agents interact with each other, and a reasoner that figures out when certain actions can be taken based on those patterns. |
Keywords
» Artificial intelligence » Deep learning » Generalization » Reinforcement learning » Zero shot