SCALER: Synthetic Scalable Adaptive Learning Environment for Reasoning

Official codebase for SCALER (arXiv 2026): a framework for synthesizing verifiable, difficulty-controllable reasoning environments from real-world programming problems, and training LLMs with adaptive multi-environment RL to sustain informative learning signals over long horizons.

Paper: SCALER: Synthetic sCalable Adaptive Learning Environment for Reasoning — https://arxiv.org/abs/2601.04809
This repository is built on top of verl (Volcano Engine Reinforcement Learning for LLMs) and follows its environment/runtime conventions.

Overview

Reinforcement learning (RL) can enhance LLM reasoning, but progress often slows when:

task difficulty drifts away from the model’s capability frontier (too easy / too hard), or
training is dominated by a narrow set of recurring patterns, reducing distributional diversity.

SCALER addresses both via co-adaptation between the model and training environments:

a scalable synthesis pipeline that converts real-world programming problems into verifiable environments with controllable difficulty and unbounded instance generation;
an adaptive multi-environment RL strategy that dynamically adjusts instance difficulty and curates the active set of environments to maintain informative rewards and sustain improvement.

Core Ideas

Scalable Environment Synthesis

Given a programming problem (statement + reference solution), SCALER synthesizes a reasoning environment with:

Verifiability: deterministic oracle / unit tests provide correctness signals.
Difficulty control: explicit scale parameters discretized into difficulty levels.
Unbounded instance generation: randomized testcase generation yields unlimited training instances.

Adaptive Multi-Environment RL

SCALER sustains learning signals at two levels:

In-environment difficulty controller: keeps sampling near a target success regime.
Environment curation: maintains an active set and replaces saturated/uninformative environments to preserve diversity and long-horizon improvements.

Repository Layout

High-level structure (major directories):

SCALER/ — SCALER core code (synthesis, controllers, curation, integration).
SCALER-data/ — environment pools / metadata / released artifacts (if any).
recipe/environment — runnable training / evaluation recipes (paper entry points).
verl/ — upstream training infrastructure (forked / vendored).

Quickstart

1) Setup

This repo follows verl for environment setup (CUDA / PyTorch / distributed runtime / Docker, etc.). Please refer to:

verl documentation: https://verl.readthedocs.io/en/latest/index.html
verl repo: https://github.com/volcengine/verl

Tip: If you already have verl working on your machine/cluster, SCALER should be a minimal delta.

2) Construct environments

SCALER’s environment synthesis pipeline entry:

SCALER/environment_construct.sh

Run:

cd SCALER
bash environment_construct.sh

Notes:

The script is intended as the one-click pipeline entry. Customize dataset paths / output dirs / parallelism in the script as needed.
Synthesized environments and metadata are typically managed under SCALER-data/ (see repo layout).

3) Train

Paper-style training runs are organized under recipe/. A concrete entry (Qwen3-1.7B, 2739 envs):

recipe/environment/qwen3-1.7b-2739-envs.sh

Run:

bash recipe/environment/qwen3-1.7b-2739-envs.sh

Notes:

This script is the actual training entry. It typically sets model, env pool, runtime (GPU / distributed), and logging.
If you modify environment pool / difficulty scheduling / curation knobs, do it by editing the recipe script (and/or its referenced config files).

Key Results

Performance on five reasoning benchmarks: MATH-500, AMC23, AIME24, MMLU-Pro, BBEH.
Numbers below are taken from Table 1 in the paper (AVG = unweighted mean):

Base Model	Method	MATH-500	AMC23	AIME24	MMLU-Pro	BBEH	AVG
Qwen3-1.7B-base	Base	59.6	29.21	3.33	33.30	3.26	25.74
	+ SCALER	75.8	49.53	12.91	50.89	11.74	40.18
Qwen3-4B-base	Base	66.4	44.70	8.75	51.60	8.10	35.91
	+ SCALER	84.4	75.00	27.29	70.00	14.56	54.25

Environment pool statistics (paper v1): 4973 programming problems → 2739 synthesized SCALER environments.

Citation

If you use SCALER in your research, please cite:

@article{xu2026scaler,
  title   = {SCALER: Synthetic sCalable Adaptive Learning Environment for Reasoning},
  author  = {Xu, Caijun and Xiao, Changyi and Peng, Zhongyuan and Wang, Xinrun and Cao, Yixin},
  journal = {arXiv preprint arXiv:2601.04809},
  year    = {2026},
  doi     = {10.48550/arXiv.2601.04809}
}

License & Acknowledgements

Released under Apache License 2.0 (see LICENSE).
Built on top of verl and reuses its training infrastructure. Please also check upstream verl license/notice files when redistributing.

Contact

Correspondence (paper): cjxu25@m.fudan.edu.cn

Name		Name	Last commit message	Last commit date
Latest commit History 1,965 Commits
.github		.github
SCALER-data		SCALER-data
SCALER		SCALER
docker		docker
figures		figures
recipe		recipe
scripts		scripts
verl		verl
.git-blame-ignore-revs		.git-blame-ignore-revs
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml		.readthedocs.yaml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Notice.txt		Notice.txt
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SCALER: Synthetic Scalable Adaptive Learning Environment for Reasoning

Table of Contents

Overview

Core Ideas

Scalable Environment Synthesis

Adaptive Multi-Environment RL

Repository Layout

Quickstart

1) Setup

2) Construct environments

3) Train

Key Results

Citation

License & Acknowledgements

Contact

About

Uh oh!

Languages

License

ALEX-nlp/SCALER

Folders and files

Latest commit

History

Repository files navigation

SCALER: Synthetic Scalable Adaptive Learning Environment for Reasoning

Table of Contents

Overview

Core Ideas

Scalable Environment Synthesis

Adaptive Multi-Environment RL

Repository Layout

Quickstart

1) Setup

2) Construct environments

3) Train

Key Results

Citation

License & Acknowledgements

Contact

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Languages