Skip to content

NVIDIA-NeMo/ProRL-Agent-Server

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4,400 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ProRL Agent Server (POLAR)

Polar rollout architecture

Apache 2.0 License

Polar is a RL rollout framework for real-world agent harnesses.

  1. Harness as Environment. Bring your agent harnesses as RL-ready environments without code change.
  2. Smart Rollout Pipeline. Maximize GPU utilization with Polar's async rollout staging.
  3. Rollout as a Service. Server mode by design -- scaling Async RL with any training frameworks.

Architecture Overview

Polar rollout architecture

The Rollout Server manages and dispatches client requests into distributed Gateway Nodes, which asynchronously prepare runtime, execute agents, build trajectories and evaluate them. Agent harnesses are listened by a proxy that sits between agnostic agent execution processes and local inference servers.

Installation

uv venv
uv pip install -e .

SGLang is installed and launched separately.

uv pip install --prerelease=allow sglang==0.5.10
bash scripts/patch/patch_sglang.sh

For SWE-bench evaluation support:

uv pip install -e ".[swebench]"

Polar itself is trainer agnostic. Currently, we provide a demo-purpose Slime integration in Slime bridge installation guide.

Guide

Examples

This project is under early development. We are actively adding new examples for different tasks / models on diverse hardware setups. Contributions are welcome!

Roadmap

Our development goal for Polar is low-intrusion and neutral, finding the lowest common ancestor to cover and support diverse training and inference frameworks.

  • Initial release & tech report.
  • Slime bridge & RL example.
  • vLLM dual inference support.
  • More trainer bridges (NemoRL, VERL, etc.).
  • CUA (VLM / VLA) Support.
Polar rollout architecture

📖 Reference

Important

If you find it useful, please consider citing our work:

@article{zhang2026prorl,
  title={ProRL Agent: Rollout-as-a-Service for RL Training of Multi-Turn LLM Agents},
  author={Zhang, Hao and Liu, Mingjie and Zhang, Shaokun and Han, Songyang and Hu, Jian and Jin, Zhenghui and Zhang, Yuchi and Diao, Shizhe and Lu, Ximing and Xu, Binfeng and others},
  journal={arXiv preprint arXiv:2603.18815},
  year={2026}
}

About

Agentic RL on Any Harness at Scale

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors