Skip to content

Cognitive-AI-Systems/marl-gpt

Repository files navigation

MARL-GPT: Foundation Model for Multi-Agent Reinforcement Learning

arXiv License: MIT Hugging Face Hugging Face

Official implementation of MARL-GPT, a transformer-based foundation model for multi-agent reinforcement learning. This repository provides code for offline training on expert trajectories, model evaluation, and online fine-tuning with PPO. MARL-GPT demonstrates that a single transformer-based model can achieve competitive performance across diverse MARL environments (including StarCraft Multi-Agent Challenge (SMACv2), Google Research Football (GRF), and POGEMA) without task-specific architectural modifications.

Additional information

  1. Detailed configuration instructions for adding new environments, customizing training hyperparameters, and defining positional encodings are provided in SETTINGS.md.
  2. Pre-collected expert trajectories for offline training are available on Hugging Face: MARL-GPT Datasets
  3. The full text of the paper, including the Appendix, is available at the following link: arXiv.
  4. Video for the robotics experiment can be found at this link: video.

Installation

Local Installation

It's recommended to use uv to install dependencies, but pip or conda should work too:

uv pip install -r docker/requirements.txt

For environments requiring StarCraft II (e.g., SMACv2), install the game binaries. On Ubuntu:

bash docker/install_sc2.sh

Set the SC2PATH environment variable to the installation directory:

export SC2PATH=/path/to/StarCraftII

Important: SC2PATH must be set before running any SMACv2 experiment. Without it, the environment will fail with a cryptic import or connection error. To make it permanent, add the line to your ~/.bashrc or ~/.zshrc.

Docker (Optional)

For the server-side experiments, it's recommended to use Docker images. Navigate to the docker folder and run:

sh build.sh

This will set up Docker and tag it with the appropriate name. The next steps are straightforward: create a Docker container and include the source code files.

Quickstart

Offline Training

Train the MARL-GPT model on pre-collected expert trajectories:

python train_actor_critic.py config/config-train-critic.py

For multi-GPU training, use torchrun:

torchrun --standalone --nproc_per_node=4 train_actor_critic.py config/config-train-critic.py

Trained checkpoints are saved to the out/ directory.

Online Fine-Tuning

Fine-tune a pre-trained model using online rollouts:

  1. Set history_len in the create_env.py file for your target environment (look for TODO FINE-TUNE comments).

  2. Launch fine-tuning, passing the checkpoint path via --gpt_model_path:

python finetune_sf.py \
  --env=GRF \
  --experiment=academy_single_goal_versus_lazy \
  --save_best_metric=goal_diff \
  --num_workers=8 \
  --gpt_model_path=out/ckpt.pt

To train from scratch instead of loading a checkpoint, pass --gpt_model_path=none.

Fine-tuned checkpoints are saved to the train_dir/ directory.

Evaluation

Run evaluation for a specific benchmark environment (example: SMACv2):

python all_env_bench.py --env_name smacv2

all_env_bench.py runs the evaluation suite for supported environments (see the script for available options and defaults).

Converting fine-tuned checkpoints (online → eval)

If you fine-tuned the model online (PPO/Sample Factory), convert the resulting checkpoints to the evaluation format:

python gpt_model_from_sf_to_bench.py \
  --gpt-path out/ckpt.pt \
  --ppo-path train_dir/checkpoint_p0/best.pth \
  --output-path out/ppo_model.pt

Repository Structure

.
├── config/              # Training and dataset configuration files
├── docker/              # Docker setup and environment installation scripts
├── envs/                # Environment wrappers and adapters
│   ├── smac_env/        # StarCraft Multi-Agent Challenge
│   ├── grf_env/         # Google Research Football
│   └── pogema_env/      # POGEMA pathfinding
├── gpt/                 # Model architecture and training logic
│   └── finetune_vers/   # Adapt GPT model for fine-tuning
├── utils/               # Dataset loaders and utilities
├── train_actor_critic.py   # Offline training entry point
├── finetune_sf.py          # Online fine-tuning entry point
└── all_env_bench.py        # Multi-environment evaluation script

Citation

If you use this code in your research, please cite:

@article{nesterova2026marl,
  title={MARL-GPT: Foundation Model for Multi-Agent Reinforcement Learning},
  author={Nesterova, Maria and Kolosov, Mikhail and Andreychuk, Anton and Cherepanov, Egor and Bulichev, Oleg and Kovalev, Alexey and Yakovlev, Konstantin and Panov, Aleksandr and Skrynnik, Alexey},
  journal={arXiv preprint arXiv:2604.05943},
  year={2026}
}

About

[AAMAS-2026] MARL-GPT: This work tackles the challenge of a general-purpose model for multi-agent reinforcement learning. The proposed MARL-GPT uses offline RL on large-scale expert trajectories, leveraging a single GPT-based transformer encoder to train one model that performs competitively across diverse MARL environments (SMACv2, GRF and POGEMA)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors