NVIDIA · ivanbasov · Apr 3, 2026 · Mar 30, 2026 · Mar 30, 2026 · Apr 1, 2026
diff --git a/README.md b/README.md
@@ -173,6 +173,62 @@ Notes:
 - TensorRT workflows (`ONNX_WORKFLOW=2` or `3`) require `tensorrt` and `modelopt`.
 - FP8 quantization failure is fatal. INT8 failure falls back to the FP32 ONNX model silently.
 - ONNX and engine files are written to the current working directory.
+- `ONNX_WORKFLOW` is also honoured by the `decoder_ablation` workflow — see below.
+
+### Decoder ablation study with cudaq-qec (optional)
+
+The `decoder_ablation` workflow compares multiple global decoders on the residual syndromes left
+by the neural pre-decoder. It supports both PyTorch and TensorRT backends for the pre-decoder
+and GPU-accelerated global decoders from the `cudaq-qec` package (`cudaq_qec`).
+
+**PyTorch pre-decoder + cudaq-qec global decoders:**
+
+```bash
+# Requires: cudaq-qec (cudaq_qec), ldpc, beliefmatching, scipy
+WORKFLOW=decoder_ablation bash code/scripts/local_run.sh
+```
+
+**TRT pre-decoder + cudaq-qec global decoders (full GPU pipeline):**
+
+The same `ONNX_WORKFLOW` variable used for `inference` also applies here. When a TRT engine is
+active, the neural pre-decoder runs via TensorRT (fast, quantised inference) while `cudaq-qec`
+decoders handle the residual syndromes on GPU — combining fast TRT inference with
+GPU-accelerated global decoding end-to-end.
+
+```bash
+# Export ONNX, build TRT engine, run ablation (TRT pre-decoder + cudaq-qec)
+ONNX_WORKFLOW=2 WORKFLOW=decoder_ablation bash code/scripts/local_run.sh
+
+# INT8 quantized TRT pre-decoder + cudaq-qec
+ONNX_WORKFLOW=2 QUANT_FORMAT=int8 WORKFLOW=decoder_ablation bash code/scripts/local_run.sh
+
+# Load a previously built engine, then run ablation
+ONNX_WORKFLOW=3 WORKFLOW=decoder_ablation bash code/scripts/local_run.sh
+```
+
+The ablation study reports per-decoder logical error rates, convergence statistics for
+`cudaq-qec` BP variants, residual syndrome weight distributions, and timing breakdowns.
+Results are written to `outputs/<EXPERIMENT_NAME>/plots/`.
+
+**Decoder variants benchmarked:**
+
+| Decoder | Source | Notes |
+|---|---|---|
+| No-op | — | Pre-decoder output only, no global correction |
+| Union-Find | `ldpc` | Fast, sub-optimal |
+| BP-only | `ldpc` | Belief propagation, no OSD |
+| BP+LSD-0 | `ldpc` | BP with localized statistics decoding |
+| Uncorr-PM | PyMatching | Uncorrelated minimum-weight perfect matching |
+| Corr-PM | PyMatching | Correlated MWPM (best classical baseline) |
+| cudaq-BP | `cudaq-qec` | Sum-product BP on GPU |
+| cudaq-MinSum | `cudaq-qec` | Min-sum BP on GPU |
+| cudaq-BP+OSD-0/7 | `cudaq-qec` | BP + ordered statistics decoding |
+| cudaq-MemBP | `cudaq-qec` | Memory-based min-sum BP |
+| cudaq-MemBP+OSD | `cudaq-qec` | Memory BP + OSD |
+| cudaq-RelayBP | `cudaq-qec` | Sequential relay composition |
+
+`cudaq-qec` decoders are loaded automatically when `cudaq_qec` is importable; the study
+degrades gracefully to the non-cudaq decoders if the package is absent.
 
 ### GPU selection