More research

On the Feasibility of Cross-Task Transfer
with Model-Based Reinforcement Learning

Yifan Xu*,  Nicklas Hansen*,  Zirui Wang,
Yung-Chieh Chan,  Hao Su,  Zhuowen Tu

UC San Diego
*Equal contribution

Overview. We propose a framework for Model-Based Cross-Task Transfer (XTRA). We pretrain a world model on offline multi-task data from Atari 2600 games, and finetune the model by online interaction on unseen games. To prevent catastrophical forgetting, we finetune the world model on data from the new task (game) in addition to all pretraining tasks.

Abstract

Reinforcement Learning (RL) algorithms can solve challenging control problems directly from image observations, but they often require millions of environment interactions to do so. Recently, model-based RL algorithms have greatly improved sample-efficiency by concurrently learning an internal model of the world, and supplementing real environment interactions with imagined rollouts for policy improvement. However, learning an effective model of the world from scratch is challenging, and in stark contrast to humans that rely heavily on world understanding and visual cues for learning new skills. In this work, we investigate whether internal models learned by modern model-based RL algorithms can be leveraged to solve new, distinctly different tasks faster. We propose Model-Based Cross-Task Transfer (XTRA), a framework for sample-efficient online RL with scalable pretraining and finetuning of learned world models. By offline multi-task pretraining and online cross-task finetuning, we achieve substantial improvements over a baseline trained from scratch; we improve mean performance of model-based algorithm EfficientZero by 23%, and by as much as 71% in some instances.

Summary (1 min)

Results

We evaluate our method (XTRA) built on top of EfficientZero on a total of 14 Atari 2600 games, spanning a diverse set of visuals and game mechanics. Our method improves the sample-efficiency of EfficientZero in a variety of games despite great task diversity.

Atari100k benchmark results. Our method improves mean final performance of model-based algorithm EfficientZero by 23% (and up to 71% in some instances), with even larger gains early in training. We consider a total of 14 Atari 2600 games.

Method

We propose Model-Based Cross-Task Transfer (XTRA), a two-stage framework for offline multi-task pretraining and cross-task transfer of learned world models by finetuning with online RL. Specifically, we first pretrain a world model on offline data from a set of diverse pretraining tasks (stage 1), and then iteratively finetune the pretrained model on data from a target task collected by online interaction, as well as the offline multi-task dataset (stage 2).

Concurrent cross-task finetuning (stage 2). Overview of our approach to online finetuning. We iteratively collect data for a target task by online interaction, and jointly finetune the model on data from the target task as well as offline data for each of the pretraining tasks. While this mitigates catastrophical forgetting, not all pretraining tasks are beneficial. Therefore, we periodically re-weight gradient contributions based on gradient cosine similarity to the target task. Please refer to our paper for further details.

Paper

Y. Xu, N. Hansen, Z. Wang, Y. Chan, H. Su, Z. Tu
On the Feasibility of Cross-Task Transfer with Model-Based Reinforcement Learning

arXiv 2022


View on arXiv

Citation

If you use our method or code in your research, please consider citing the paper as follows:

@article{xu2022xtra, title={On the Feasibility of Cross-Task Transfer with Model-Based Reinforcement Learning}, author={Yifan Xu and Nicklas Hansen and Zirui Wang and Yung-Chieh Chan and Hao Su and Zhuowen Tu}, journal={arXiv preprint arXiv:2210.10763}, year={2022} }
Correspondence to Nicklas Hansen. Website based on TD-MPC and Nerfies.