Code for the paper: "Optimizing Quantum Circuits via ZX Diagrams using Reinforcement Learning and Graph Neural Networks"
This repository implements the Appendix C setting (compact 8-feature representation), including:
- Tree-search training with PPO (
main_tree.py) - Flat PPO baseline (
ppo.py) - BQSKit integration (
bqskit_pass.py) - Compiler benchmarking (
bench_compilers.py)
High-level workflow (see WORKFLOW.md for the full walkthrough):
- Circuit -> ZX graph (
pyzx_environment/zx_env/env.py) - Observation wrapping ->
GraphMask(utils.py):- Expanded graph representation
- Action mask of applicable ZX rewrite rules per node
- Tree search over rewrite trajectories (
TreePolicy.py) - Agent scoring via
BundleNet(models/bundle_net.py) - PPO optimization (
main_tree.pyorppo.py) - Best ZX state -> extracted optimized circuit
The main optimization target is reducing two-qubit gate count (CNOT/CZ).
Use Python 3.10+ (recommended) and create a fresh environment:
python -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pipInstall core dependencies (adjust CUDA-specific Torch wheels as needed):
pip install torch torchvision torchaudio
pip install torch-geometric
pip install hydra-core omegaconf gymnasium numpy tqdm tensorboard ray pyzxInstall the local ZX environment package:
pip install -e ./pyzx_environmentOptional dependencies for benchmarking / compiler integration:
pip install bqskit qiskit qiskit-ibm-transpiler pandas seaborn matplotlibAll training entry points are Hydra-based and accept config group overrides from conf/.
python -u main_tree.py \
+algorithm=PPO \
+model=GATActionModel \
+env=simple \
exp_name="smoke_test" \
env.num_envs=2 \
algorithm.total_timesteps=50000 \
algorithm.num_steps=32 \
max_tree_size=64 \
multi_range=2 \
device="cpu"python -u main_tree.py \
+algorithm=PPO \
exp_name="20MIO_32_envs_5qubit_128treesize" \
+model=GATActionModel \
+env=more_complex_more_rules_ranges \
env.num_envs=32 \
model.model_type="ActionAtt" \
model.n_message_passing=4 \
algorithm.total_timesteps=20_000_000 \
algorithm.num_steps=129 \
algorithm.learning_rate=3e-3 \
max_tree_size=128 \
multi_range=4 \
env.n_qubits=5 \
device="cpu"python -u ppo.py \
+algorithm=PPO \
+model=GATActionModel \
+env=more_complex_more_rules_ranges \
exp_name="flat_ppo_run"- Large-scale runs are intended to use Ray for distributed rollouts.
- For multi-node deployment, configure your Ray cluster before launching training.
- Reference: https://docs.ray.io/en/latest/ray-overview/getting-started.html
Training outputs are written under runs/.
- TensorBoard logs:
runs/<run_name>/ - Saved models:
runs/<run_name>/saves/model-<step>.pth - Saved optimal paths:
runs/<run_name>/saves/data-<step>.pkl
Launch TensorBoard with:
tensorboard --logdir runsbench_compilers.py CLI:
python bench_compilers.py <output_pickle> <searchdepth> <mq_ratio> <h_ratio> <t_ratio>Example:
python bench_compilers.py results.pkl 4 1.0 0.0 0.0Notes:
- Some benchmark options depend on external optimizer code that is not distributed in this repository.
- If you do not use those external methods, disable/comment the corresponding benchmark paths.
bqskit_pass.py loads a model path from ZX_MODEL_PATH (or falls back to a default path).
export ZX_MODEL_PATH="runs/<run_name>/saves/model-<step>.pth"Then run your BQSKit-based workflow (for examples, see bench_compilers.py and bqskit_pass.py).
main_tree.py: Tree-search PPO training loopppo.py: Flat PPO baselineTreePolicy.py: Tree data structure and batched policy/value forward passmodels/: Model definitions (BundleNet,TreeNet,ActionModel, etc.)utils.py: Observation wrappers andGraphMaskhandlingpyzx_environment/: ZX RL environment packageWORKFLOW.md: Full end-to-end algorithmic walkthrough