Releases: micdoh/XLRON
XLRON v1.0.0
XLRON v1.0.0
The first major release of XLRON — a JAX-based simulation framework for resource allocation in optical networks. One library, every execution mode: a fast simulation engine that runs entirely on accelerator hardware, an integrated PPO trainer, classical heuristics, capacity bound estimators, an end-to-end differentiable physical layer, and a browser GUI.
Documentation: https://micdoh.github.io/XLRON/
What's in this release
Reproduces results for these papers
-
XLRON: Accelerated Reinforcement Learning Environments for Optical Networks — Doherty, Beghelli — OFC 2024.
The original announcement of XLRON, focused on the JAX implementation, GPU parallelism, and the DeepRMSA reproduction. -
Reinforcement Learning with Graph Attention for Routing and Wavelength Assignment with Lightpath Reuse — Doherty, Beghelli — ONDM 2025 (arXiv:2502.14741).
A Graph-Attention RL agent for the RWA-with-Lightpath-Reuse formulation. → Reproduce all figures -
Reinforcement Learning for Dynamic Resource Allocation in Optical Networks: Hype or Hope? — Doherty, Beghelli — JOCN 17(9), D1 (2025), DOI 10.1364/JOCN.559990 (arXiv:2406.01919).
A systematic survey of ~100 RL-for-RMSA papers, reproducing five highly-cited published RL solutions in matched simulation settings. → Reproduce all figures -
Comparison of Dynamic Elastic Optical Network Capacity Bound Estimation Methods — Doherty*, Deng*, Beghelli, Toni, Bayvel — submitted to ECOC 2026.
Systematic comparison of cut-set, resource-prioritized defragmentation, and KSP heuristic bounds across the 118 real-world topologies of TopologyBench. → Reproduce all figures -
XLRON: A Framework for Hardware-Accelerated and Differentiable Simulation of Optical Networks — Doherty, Beghelli, Jarmolovičius, Deng, Killey, Bayvel, Toni — in preparation, targeted at JOCN.
Architecture, cross-library benchmarks, ISRS GN + DRA validation against the Gerard et al. 2025 C+L-band experiment, differentiable-simulation case studies, and the GUI/CLI. → Reproduce all figures -
Graph Transformers and Stabilized Reinforcement Learning for Large-Scale Dynamic Routing, Modulation and Spectrum Allocation in Elastic Optical Networks — Doherty, Beghelli, Toni — in preparation, targeted at JOCN.
First transformer trained from scratch with RL for dynamic RMSA. Beats the strongest heuristic on every standard benchmark and scales to USA100 (100 nodes) and TataInd (143 nodes). → Reproduce all figures
Speed
- 6 × 10⁶ steps/s for RMSA on a single A100 with 2,048 parallel envs.
- 300× higher single-device throughput than Flex Net Sim (the fastest single-core CPU library) once GPU parallelism is engaged.
- 222–1,494× wall-clock speedup over DeepRMSA / Optical-RL-Gym for end-to-end RL training on the canonical DeepRMSA benchmark, while also achieving lower blocking via invalid action masking.
- Near-linear scaling with the number of parallel environments on GPU.
- See the Speed & comparisons page for the cross-library numbers.
Browser GUI
- New tabbed Streamlit GUI exposing every option, every execution mode, every preset.
- Launch with a single command:
xlron. Point your browser at the printed URL. - Tabs: Setup, Model & Training, Physical Layer, Logging & Output. Every flag has a descriptive tooltip.
- All execution modes covered: RL training, heuristic evaluation, model evaluation, capacity bound estimation, differentiable optimization.
- Save and reload complete configurations as presets (e.g. the
gerard2025C+L-band 90-channel system). - Live output stream with the constructed CLI command, training progress, blocking probability, and other metrics as they appear.
- Live
rendervisualisation of per-link spectrum allocation, network topology with the current request highlighted, blocking probability, utilisation, and request details. - See the GUI page.
Graph Transformer policy (--USE_TRANSFORMER)
- Built-in Graph Transformer actor-critic with Wavelet-Induced Rotary Encodings (WiRE) for graph positional encoding.
- To our knowledge, the first transformer trained from scratch with RL on dynamic RMSA, and the first RL method from the standard benchmarks to consistently match or beat the strongest heuristics.
- Three training stabilisations make this work with PPO + invalid action masking:
- Off-policy invalid action masking (
--OFF_POLICY_IAM) — importance ratios computed against the unmasked policy so gradients flow through masked actions. - Valid mass stabilisation (
--VALID_MASS_LOSS_COEF,--VML_SCHEDULE,--IAM_GATING,--IAM_DAMPING) — log-barrier loss + per-step damping + hard gating that keep the unmasked policy from collapsing off the valid action set. - Pre-LayerNorm — eliminates warm-up, keeps gradients well-behaved.
- Off-policy invalid action masking (
- Separate value-function optimizer (
--SEPARATE_VF_OPTIMIZER,--VF_LR) for stable critic training. - Scales to USA100 (100 nodes) and TataInd (143 nodes) — the largest dynamic RMSA instances ever attempted with RL.
- See the Graph Transformer page.
Physical layer: ISRS GN model + Distributed Raman Amplification
- Closed-form ISRS GN model with Nyquist subchannels and EGN correction.
- Distributed Raman Amplification (DRA) support via
isrs_gn_model_dra.py— Raman-pump-aware NLI and ASE calculations using a Friis-cascaded hybrid Raman+EDFA noise figure. - ODE-based pump fitting at env creation time (
fit_dra_params_triangular()solves the Raman ODE viajax.experimental.ode.odeint, fits semi-analytical profiles viajaxopt.LevenbergMarquardt). - Validated to within 0.5 dB against the Gerard et al. 2025 record-throughput C+L-band experiment.
- See the Physical layer page.
End-to-end differentiable simulation
- The first fully differentiable optical-network simulator. Gradients flow end-to-end through the physical layer and the resource-allocation logic.
- Enables gradient-based pump power optimisation and direct gradient-based RSA optimisation.
diff_utils.pyprovides differentiable approximations using straight-through gradient estimators and temperature-controlled soft functions:differentiable_where,differentiable_compare,differentiable_argmax,differentiable_round/ceil/floor,differentiable_index,differentiable_one_hot_index_update,differentiable_cond.- Toggle with
--differentiable(defaultFalse); tune sharpness with--temperature. - See the Differentiable simulation page and the Differentiable DRA notes.
All 119 real-world topologies from TopologyBench
- Every topology from TopologyBench (Matzner et al. 2024) bundled out of the box, in both directed and undirected variants.
- Switch topologies with a single flag — from small academic networks up to USA100 (100 nodes, 342 directed links) and TataInd (143 nodes, 362 directed links).
- The historically standard research topologies are also bundled: NSFNET, COST239, USNET, JPN48, German17, CONUS, 5-node.
- See the Topologies page.
Documentation
- Reorganised, paper-by-paper documentation site at https://micdoh.github.io/XLRON/.
- New feature pages for Speed, GUI, Transformer, Physical Layer, Differentiable, Topologies.
- Reproduction guides for every paper.
- Full flags reference, training guide, heuristic evaluation guide, capacity bounds guide.
Other improvements
- Stabilised PPO with reward centering (
--REWARD_CENTERING), prioritized experience replay (--PRIO_ALPHA,--PRIO_BETA0), VTrace-style importance ratio clipping (--RHO_CLIP,--C_CLIP), and optional separate value-function optimizer. - New cut-set and reconfigurable-routing capacity bound estimators (
xlron.bounds.cutsets_bounds,xlron.bounds.reconfigurable_routing_bounds). - 13 path heuristics:
ksp_ff,ksp_lf,ksp_bf,ksp_mu,ff_ksp,lf_ksp,bf_ksp,mu_ksp,kmc_ff,kmf_ff,kme_ff,kca_ff. - Path-sort criteria:
spectral_resources,hops,distance,hops_distance,capacity. - Pre-commit hooks and CI gating; ruff format + ty type checking adopted across the codebase.
- Reorganised
experimental/directory to colocate code, results, and figures per paper.
Installation
git clone https://github.com/micdoh/XLRON.git
cd XLRON
uv sync # base
uv sync --group dev # dev dependencies
uv sync --group gpu # if running on GPU
uv sync --group tpu # if running on TPUCiting
If you use XLRON in published work, please cite the relevant paper(s) listed at https://micdoh.github.io/XLRON/papers/. If you use the bundled TopologyBench topologies, please also cite TopologyBench.
Acknowledgements
This work was supported by the EPSRC grant EP/S022139/1 (Centre for Doctoral Training in Connected Electronic and Photonic Systems) and EPSRC Programme Gr...
Reinforcement Learning for Dynamic Resource Allocation in Optical Networks: Hype or Hope?
All code used to produce results in the paper:
"Reinforcement Learning for Dynamic Resource Allocation in Optical Networks: Hype or Hope?"
Prior to execution, it is recommended to set up a virtual environment using the pyproject.toml or requirements.txt in the repository. Instruction can be found in the online documentation: https://micdoh.github.io/XLRON/
To generate data, execute the Bash scripts in this folder, after replacing PYTHON_PATH and SCRIPT_PATH variables with absolute paths to the virtual environment python interpreter and the generate_data script, respectively:
./data_processing/JOCN2024/generate_data
To generate plots, execute the python scripts and jupyter notebooks in this folder, replacing file path variables to point to newly generated data CSV's:
./data_processing/JOCN2024/generate_plots