Official implementation of MARL-GPT, a transformer-based foundation model for multi-agent reinforcement learning. This repository provides code for offline training on expert trajectories, model evaluation, and online fine-tuning with PPO. MARL-GPT demonstrates that a single transformer-based model can achieve competitive performance across diverse MARL environments (including StarCraft Multi-Agent Challenge (SMACv2), Google Research Football (GRF), and POGEMA) without task-specific architectural modifications.
- Detailed configuration instructions for adding new environments, customizing training hyperparameters, and defining positional encodings are provided in SETTINGS.md.
- Pre-collected expert trajectories for offline training are available on Hugging Face: MARL-GPT Datasets
- The full text of the paper, including the Appendix, is available at the following link: arXiv.
- Video for the robotics experiment can be found at this link: video.
It's recommended to use uv to install dependencies, but pip or conda should work too:
uv pip install -r docker/requirements.txtFor environments requiring StarCraft II (e.g., SMACv2), install the game binaries. On Ubuntu:
bash docker/install_sc2.shSet the SC2PATH environment variable to the installation directory:
export SC2PATH=/path/to/StarCraftIIImportant:
SC2PATHmust be set before running any SMACv2 experiment. Without it, the environment will fail with a cryptic import or connection error. To make it permanent, add the line to your~/.bashrcor~/.zshrc.
For the server-side experiments, it's recommended to use Docker images. Navigate to the docker folder and run:
sh build.shThis will set up Docker and tag it with the appropriate name. The next steps are straightforward: create a Docker container and include the source code files.
Train the MARL-GPT model on pre-collected expert trajectories:
python train_actor_critic.py config/config-train-critic.pyFor multi-GPU training, use torchrun:
torchrun --standalone --nproc_per_node=4 train_actor_critic.py config/config-train-critic.pyTrained checkpoints are saved to the out/ directory.
Fine-tune a pre-trained model using online rollouts:
-
Set
history_lenin thecreate_env.pyfile for your target environment (look forTODO FINE-TUNEcomments). -
Launch fine-tuning, passing the checkpoint path via
--gpt_model_path:
python finetune_sf.py \
--env=GRF \
--experiment=academy_single_goal_versus_lazy \
--save_best_metric=goal_diff \
--num_workers=8 \
--gpt_model_path=out/ckpt.ptTo train from scratch instead of loading a checkpoint, pass --gpt_model_path=none.
Fine-tuned checkpoints are saved to the train_dir/ directory.
Run evaluation for a specific benchmark environment (example: SMACv2):
python all_env_bench.py --env_name smacv2all_env_bench.py runs the evaluation suite for supported environments (see the script for available options and defaults).
If you fine-tuned the model online (PPO/Sample Factory), convert the resulting checkpoints to the evaluation format:
python gpt_model_from_sf_to_bench.py \
--gpt-path out/ckpt.pt \
--ppo-path train_dir/checkpoint_p0/best.pth \
--output-path out/ppo_model.pt.
├── config/ # Training and dataset configuration files
├── docker/ # Docker setup and environment installation scripts
├── envs/ # Environment wrappers and adapters
│ ├── smac_env/ # StarCraft Multi-Agent Challenge
│ ├── grf_env/ # Google Research Football
│ └── pogema_env/ # POGEMA pathfinding
├── gpt/ # Model architecture and training logic
│ └── finetune_vers/ # Adapt GPT model for fine-tuning
├── utils/ # Dataset loaders and utilities
├── train_actor_critic.py # Offline training entry point
├── finetune_sf.py # Online fine-tuning entry point
└── all_env_bench.py # Multi-environment evaluation script
If you use this code in your research, please cite:
@article{nesterova2026marl,
title={MARL-GPT: Foundation Model for Multi-Agent Reinforcement Learning},
author={Nesterova, Maria and Kolosov, Mikhail and Andreychuk, Anton and Cherepanov, Egor and Bulichev, Oleg and Kovalev, Alexey and Yakovlev, Konstantin and Panov, Aleksandr and Skrynnik, Alexey},
journal={arXiv preprint arXiv:2604.05943},
year={2026}
}