Summary of Scaling Offline Model-based Rl Via Jointly-optimized World-action Model Pretraining, by Jie Cheng et al.
Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining
by Jie Cheng, Ruixi Qiao, Yingwei Ma, Binhua Li, Gang Xiong, Qinghai Miao, Yongbin Li, Yisheng Lv
First submitted to arxiv on: 1 Oct 2024
Categories
- Main: Machine Learning (cs.LG)
- Secondary: Artificial Intelligence (cs.AI)
GrooveSquid.com Paper Summaries
GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!
Summary difficulty | Written by | Summary |
---|---|---|
High | Paper authors | High Difficulty Summary Read the original abstract here |
Medium | GrooveSquid.com (original content) | Medium Difficulty Summary The paper introduces JOWA, a jointly-optimized world-action model that leverages image observation-based world models for scaling offline reinforcement learning (RL) and enhancing generalization on novel tasks. By pretraining on multiple Atari games with 6 billion tokens data, JOWA learns a general-purpose representation and decision-making ability. The method combines a shared transformer backbone to stabilize temporal difference learning during pretraining. A provably efficient and parallelizable planning algorithm is proposed to compensate for Q-value estimation error, allowing for better policy search. Experimental results demonstrate that the largest agent achieves 78.9% human-level performance on pretrained games using only 10% subsampled offline data, outperforming state-of-the-art baselines by 31.6% on average. Additionally, JOWA scales favorably with model capacity and can efficiently transfer to novel games using only 5k offline fine-tuning data. |
Low | GrooveSquid.com (original content) | Low Difficulty Summary This paper creates a new way for artificial intelligence (AI) agents to learn from lots of different video game datasets. The goal is to make the AI agent very good at playing many different games, not just one or two. To do this, they use an approach called “offline reinforcement learning” and combine it with another technique called “image observation-based world models.” They test their method on a bunch of Atari games and show that it can learn really quickly and play the games almost as well as humans. This is important because it could help AI agents learn from lots of different sources, making them smarter and more able to adapt to new situations. |
Keywords
» Artificial intelligence » Fine tuning » Generalization » Pretraining » Reinforcement learning » Transformer