Summary of Scaling Offline Model-based Rl Via Jointly-optimized World-action Model Pretraining, by Jie Cheng et al.

Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining

by Jie Cheng, Ruixi Qiao, Yingwei Ma, Binhua Li, Gang Xiong, Qinghai Miao, Yongbin Li, Yisheng Lv

First submitted to arxiv on: 1 Oct 2024

GrooveSquid.com Paper Summaries

GrooveSquid.com’s goal is to make artificial intelligence research accessible by summarizing AI papers in simpler terms. Each summary below covers the same AI paper, written at different levels of difficulty. The medium difficulty and low difficulty versions are original summaries written by GrooveSquid.com, while the high difficulty version is the paper’s original abstract. Feel free to learn from the version that suits you best!

Summary difficulty	Written by	Summary
High	Paper authors	High Difficulty Summary Read the original abstract here
Medium	GrooveSquid.com (original content)	Medium Difficulty Summary The paper introduces JOWA, a jointly-optimized world-action model that leverages image observation-based world models for scaling offline reinforcement learning (RL) and enhancing generalization on novel tasks. By pretraining on multiple Atari games with 6 billion tokens data, JOWA learns a general-purpose representation and decision-making ability. The method combines a shared transformer backbone to stabilize temporal difference learning during pretraining. A provably efficient and parallelizable planning algorithm is proposed to compensate for Q-value estimation error, allowing for better policy search. Experimental results demonstrate that the largest agent achieves 78.9% human-level performance on pretrained games using only 10% subsampled offline data, outperforming state-of-the-art baselines by 31.6% on average. Additionally, JOWA scales favorably with model capacity and can efficiently transfer to novel games using only 5k offline fine-tuning data.
Low	GrooveSquid.com (original content)	Low Difficulty Summary This paper creates a new way for artificial intelligence (AI) agents to learn from lots of different video game datasets. The goal is to make the AI agent very good at playing many different games, not just one or two. To do this, they use an approach called “offline reinforcement learning” and combine it with another technique called “image observation-based world models.” They test their method on a bunch of Atari games and show that it can learn really quickly and play the games almost as well as humans. This is important because it could help AI agents learn from lots of different sources, making them smarter and more able to adapt to new situations.

Keywords

» Artificial intelligence » Fine tuning » Generalization » Pretraining » Reinforcement learning » Transformer

Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining

by Jie Cheng, Ruixi Qiao, Yingwei Ma, Binhua Li, Gang Xiong, Qinghai Miao, Yongbin Li, Yisheng Lv

Categories

GrooveSquid.com Paper Summaries

Keywords

Summary of Enhancing Solution Efficiency in Reinforcement Learning: Leveraging Sub-gflownet and Entropy Integration, by Siyi He

Summary of Improved Generation Of Synthetic Imaging Data Using Feature-aligned Diffusion, by Lakshmi Nair

Related Posts