NVIDIA · kvmto · May 28, 2026 · May 28, 2026 · May 29, 2026 · May 29, 2026
diff --git a/README.md b/README.md
@@ -29,6 +29,7 @@ The public release exposes a **single user-facing config** and a **single runner
   - [Converting .pt checkpoints to SafeTensors](#converting-pt-checkpoints-to-safetensors-optional-post-training)
   - [ONNX export and quantization](#onnx-export-and-quantization-optional-post-training)
   - [Generating data for CUDA-Q QEC](#generating-data-for-cuda-q-qec-realtime-predecoder-test-application)
+  - [Offline decoding from Stim detector samples](#offline-decoding-from-stim-detector-samples)
   - [Decoder ablation study with cudaq-qec](#decoder-ablation-study-with-cudaq-qec-optional)
 - [Configuration and advanced usage](#configuration-and-advanced-usage)
   - [GPU selection](#gpu-selection)
@@ -359,6 +360,205 @@ Done.
   pymatching_predictions.bin           40,008 bytes
 ```
 
+### Offline decoding from Stim detector samples
+
+This is the file-based path for decoding detector samples produced outside the
+in-memory simulator. It exists for two distinct audiences:
+
+1. **You already have detector samples** (from a QPU, a third-party simulator,
+   or a previously cached run) and want to feed them to the same decoders we
+   ship. Jump to [Bring your own detector samples](#bring-your-own-detector-samples).
+2. **You want a reproducible end-to-end smoke test.** Use the local
+   generator below, then run the same decode commands.
+
+#### File contract
+
+Each basis is exactly two files:
+
+```text
+<root>/
+  samples_X.dets       # Stim sparse detector-sample format
+  metadata_X.json      # circuit + noise fingerprint
+  samples_Z.dets
+  metadata_Z.json
+```
+
+`samples_*.dets` uses Stim's sparse format with logical observables appended,
+so a line `shot D3 D8 L0` says detectors 3 and 8 fired and logical observable
+0 flipped on that shot. Stim does not encode the memory basis in the sample
+format, so X and Z always live in separate files; the LER loop iterates over
+both when `cfg.test.meas_basis_test=both`. The resolver
+(`resolve_stim_sample_paths`) also accepts the alternate layouts
+`<root>/<basis>/samples.dets` + `metadata.json` and the flat
+`<root>/samples.dets` + `metadata.json`.
+
+The metadata JSON has the shape that
+`qec.surface_code.stim_sample_io.build_stim_sample_metadata` writes:
+
+```json
+{
+  "schema_version": 2,
+  "artifact": "stim_detector_samples",
+  "format": "dets",
+  "append_observables": true,
+  "distance": 7,
+  "n_rounds": 7,
+  "basis": "X",
+  "code_rotation": "XV",
+  "num_detectors": 168,
+  "num_observables": 1,
+  "num_shots": 262144,
+  "p_error": 0.003,
+  "noise_model": "25-param",
+  "noise_model_sha256": "abcd…",
+  "noise_model_params": { "p_prep_X": 0.002, "...": 0.0 }
+}
+```
+
+`p_error`, `noise_model`, `noise_model_sha256`, and `noise_model_params` are
+optional but recommended; when present, the decoder cross-checks its active
+noise model against the recorded fingerprint and raises by default if the two
+disagree. Files written before this schema (no noise fields) keep loading
+unchanged. `code_rotation` accepts both the canonical names (`XV`, `XH`, `ZV`,
+`ZH`) and the public aliases (`O1`..`O4`).
+
+#### Bring your own detector samples
+
+If you have `.dets` data from elsewhere (a QPU, an external simulator), the
+contract is exactly the three things above:
+
+1. Write `samples_{basis}.dets` in Stim's sparse format with observables
+   appended.
+2. Write `metadata_{basis}.json` matching the schema above. The easiest way is
+   to call `build_stim_sample_metadata(...)` and `write_metadata_json(...)`
+   from `qec.surface_code.stim_sample_io`; you can also hand-author it.
+3. Make sure `conf/config_public.yaml` reflects the experiment your samples
+   came from: `distance`, `n_rounds`, `data.code_rotation`, and
+   `data.noise_model` must match exactly. The decoder rebuilds a Stim memory
+   circuit from these and validates the file against it before decoding.
+
+Then point the launcher at the directory:
+
+```bash
+PREDECODER_STIM_SAMPLES_DIR=/path/to/your/dets \
+PREDECODER_DECODE_MODE=pymatching_only \
+WORKFLOW=inference bash code/scripts/local_run.sh
+```
+
+Validation is strict by default: mismatches in distance, rounds, basis,
+orientation, detector count, observable presence, `p_error`, or
+`noise_model_sha256` raise with one explicit error per mismatch before any
+decoding happens. To downgrade only the **noise** mismatches to warnings (for
+example when sweeping `p_error` for a calibration study), set
+`PREDECODER_STIM_STRICT_NOISE=0`. Structural mismatches are always fatal.
+
+#### Generate local reference files
+
+```bash
+WORKFLOW=generate_stim_data \
+EXPERIMENT_NAME=offline_stim_run \
+bash code/scripts/local_run.sh
+```
+
+The generator reads from `conf/config_public.yaml`:
+
+| config field | role |
+| --- | --- |
+| `distance` | surface-code distance |
+| `n_rounds` | number of measurement rounds |
+| `data.code_rotation` | code orientation (`XV`/`XH`/`ZV`/`ZH` or `O1`..`O4`) |
+| `data.noise_model` | 25-parameter noise model dict (optional) |
+| `test.meas_basis_test` | `X`, `Z`, or `both` (default `both`) |
+| `test.p_error` | scalar noise level (default `0.003`) |
+| `test.num_samples` | shots per basis (default `262144`, ~20 MB per file) |
+
+The default sample count is large because the smoke run targets LER stable to
+~3 significant digits; override `++test.num_samples=N` (or set the field in a
+local override config) to shrink it for a faster iteration. Output goes to:
+
+```text
+outputs/offline_stim_run/stim_samples/samples_X.dets
+outputs/offline_stim_run/stim_samples/metadata_X.json
+outputs/offline_stim_run/stim_samples/samples_Z.dets
+outputs/offline_stim_run/stim_samples/metadata_Z.json
+```
+
+The `generate_stim_data` workflow writes only the Stim sample artifacts. The
+CUDA-Q `.bin` artifacts (`detectors.bin`, `H_csr.bin`, etc.) live in a
+separate output dir and are produced by `python code/export/generate_test_data.py`
+directly; see [the CUDA-Q section](#generating-data-for-cuda-q-qec-realtime-predecoder-test-application).
+
+#### Decode the files
+
+PyMatching only — useful as the apples-to-apples baseline to compare against
+the Ising pre-decoder. In this mode the launcher replaces the neural model
+with `torch.nn.Identity()` and **no checkpoint is required**:
+
+```bash
+PREDECODER_STIM_SAMPLES_DIR=outputs/offline_stim_run/stim_samples \
+PREDECODER_DECODE_MODE=pymatching_only \
+WORKFLOW=inference bash code/scripts/local_run.sh
+```
+
+Ising pre-decoder followed by PyMatching — **requires a model checkpoint.**
+Point `PREDECODER_MODEL_CHECKPOINT_FILE` (or `model_checkpoint_file` in the
+config) at one of the released models, or run training under the same
+`EXPERIMENT_NAME` first:
+
+```bash
+PREDECODER_STIM_SAMPLES_DIR=outputs/offline_stim_run/stim_samples \
+PREDECODER_DECODE_MODE=ising_decoding_pymatching \
+EXTRA_PARAMS="++model_checkpoint_file=models/Ising-Decoder-SurfaceCode-1-Fast.pt" \
+WORKFLOW=inference bash code/scripts/local_run.sh
+```
+
+No changes to `conf/config_public.yaml` are required for either command; the
+existing config controls the model, distance, rounds, orientation, and noise
+model, and the Stim file metadata is checked against the rebuilt circuit
+before decoding.
+
+To persist the per-shot comparison arrays, also set:
+
+```bash
+PREDECODER_DECODE_OUTPUT_DIR=offline_decode_outputs
+```
+
+With that set, `pymatching_only` writes:
+
+* `{basis}_observables.npy`
+* `{basis}_pymatching_predictions.npy`
+
+…and `ising_decoding_pymatching` writes those plus:
+
+* `{basis}_predecoder_residual_detectors.npy`
+* `{basis}_ising_decoding_pymatching_predictions.npy`
+
+The directory is created lazily on the first write, so it is safe to point at
+a path that does not yet exist.
+
+#### Smoke script
+
+```bash
+code/scripts/offline_smoketest.sh
+```
+
+The script defaults `EXPERIMENT_NAME=offline_stim_run` (matching the example
+paths above), generates Stim files, decodes with `pymatching_only`, and (if
+`models/Ising-Decoder-SurfaceCode-1-Fast.pt` is on disk) decodes again with
+`ising_decoding_pymatching`. It then parses a structured
+`[Inference Summary]` JSON marker that the inference loop emits on the last
+line of its summary block. The marker is **off by default** to keep
+interactive and notebook runs clean; the smoketest opts in by exporting
+`PREDECODER_EMIT_INFERENCE_SUMMARY=1` before each inference call. Set the same
+env var yourself if you want to pipe these results into other tooling.
+
+Example output from one `d=7`, `n_rounds=7`, `O1`, `262,144` shots per basis
+run is shown below. Treat timing/speedup as a smoke signal, not a benchmark:
+
+```text
+[offline_smoketest.sh] Avg LER 0.002678 (no pre-decoder) -> 0.002285 (after); PyMatching speedup 1.815x
+```
+
 ### Decoder ablation study with cudaq-qec (optional)
 
 The `decoder_ablation` workflow compares multiple global decoders on the residual syndromes left

diff --git a/code/data/datapipe_stim.py b/code/data/datapipe_stim.py
@@ -25,12 +25,14 @@
 from torch.utils.data import Dataset
 
 from qec.surface_code.memory_circuit import MemoryCircuit
+from qec.surface_code.stim_sample_io import read_stim_detector_samples, resolve_stim_sample_paths
 from qec.surface_code.data_mapping import (
     normalized_weight_mapping_Xstab_memory,
     normalized_weight_mapping_Zstab_memory,
     compute_stabX_to_data_index_map,
     compute_stabZ_to_data_index_map,
 )
+from data.predecoder_transform import dets_to_predecoder_inputs
 
 
 class QCDataPipePreDecoder_Memory_inference(Dataset):
@@ -94,7 +96,7 @@ def __init__(
         self._presence_x_Z[:, 0] = 0
         self._presence_x_Z[:, -1] = 0
 
-        # If using explicit noise model, use a conservative scalar placeholder for MemoryCircuit's legacy slots.
+        # If using explicit noise model, use a conservative scalar placeholder for MemoryCircuit's scalar-rate slots.
         if noise_model is not None:
             p_placeholder = float(noise_model.get_max_probability())
         else:
@@ -380,4 +382,141 @@ def __getitem__(self, idx):
             }
 
 
-__all__ = ['QCDataPipePreDecoder_Memory_inference']
+class QCDataPipePreDecoder_Memory_from_stim_file(Dataset):
+    """
+    Datapipe for offline inference from Stim detector-sample files.
+
+    The file stores detector events plus appended observables. Metadata is
+    validated against a freshly rebuilt MemoryCircuit before data is exposed.
+
+    Noise-model validation: when ``noise_model`` is provided (the typical
+    inference path), the datapipe computes a deterministic fingerprint of its
+    25-parameter dict and asks :func:`read_stim_detector_samples` to compare it
+    against the value recorded in the JSON metadata. Mismatches raise unless
+    ``strict_noise`` is ``False`` (in which case a warning is emitted). When
+    ``noise_model`` is ``None``, only the scalar ``p_error`` is checked.
+
+    Args:
+        distance, n_rounds, num_samples, error_mode, measure_basis,
+            code_rotation: Standard circuit parameters; ``num_samples`` may
+            truncate the loaded file to the first N shots when positive.
+        stim_samples_dir: Directory containing ``samples_{basis}.dets`` and
+            ``metadata_{basis}.json``.
+        p_error: Scalar physical error rate used by the active config. Compared
+            against ``metadata['p_error']`` when present.
+        noise_model: Optional explicit :class:`NoiseModel`. When set, its
+            ``sha256()`` is compared against ``metadata['noise_model_sha256']``.
+        strict_noise: ``True`` (default) raises on noise-fingerprint drift;
+            ``False`` downgrades the failure to a :class:`UserWarning`.
+    """
+
+    def __init__(
+        self,
+        distance,
+        n_rounds,
+        num_samples,
+        error_mode,
+        stim_samples_dir,
+        p_error=0.005,
+        measure_basis='X',
+        code_rotation='XV',
+        noise_model=None,
+        strict_noise: bool = True,
+    ):
+        self.distance = int(distance)
+        self.n_rounds = max(int(n_rounds), 1)
+        self.measure_basis = str(measure_basis).upper()
+        self.code_rotation = code_rotation.upper() if code_rotation else 'XV'
+        self.requested_num_samples = int(num_samples) if num_samples is not None else 0
+
+        if self.measure_basis not in ("X", "Z"):
+            raise ValueError(
+                "Stim file datapipe expects one basis at a time. "
+                f"Got measure_basis={measure_basis!r}."
+            )
+        if error_mode != "circuit_level_surface_custom":
+            raise ValueError("error_mode not supported")
+
+        D = self.distance
+        if noise_model is not None:
+            p_placeholder = float(noise_model.get_max_probability())
+            noise_sha = noise_model.sha256()
+            noise_label = "25-param"
+        else:
+            p_placeholder = float(p_error)
+            noise_sha = None
+            noise_label = "simple"
+
+        self.circ = MemoryCircuit(
+            distance=D,
+            idle_error=p_placeholder,
+            sqgate_error=p_placeholder,
+            tqgate_error=p_placeholder,
+            spam_error=(2.0 / 3.0) * p_placeholder,
+            n_rounds=self.n_rounds,
+            basis=self.measure_basis,
+            code_rotation=self.code_rotation,
+            noise_model=noise_model,
+            add_boundary_detectors=True,
+        )
+        self.circ.set_error_rates()
+
+        samples_path, metadata_path = resolve_stim_sample_paths(
+            stim_samples_dir, self.measure_basis
+        )
+        dets_and_obs, metadata = read_stim_detector_samples(
+            samples_path=samples_path,
+            metadata_path=metadata_path,
+            distance=self.distance,
+            n_rounds=self.n_rounds,
+            basis=self.measure_basis,
+            code_rotation=self.code_rotation,
+            num_detectors=self.circ.stim_circuit.num_detectors,
+            num_observables=self.circ.stim_circuit.num_observables,
+            p_error=float(p_error),
+            noise_model_sha256=noise_sha,
+            noise_model_label=noise_label,
+            strict_noise=bool(strict_noise),
+        )
+        if self.requested_num_samples > 0:
+            dets_and_obs = dets_and_obs[:self.requested_num_samples]
+
+        self.samples_path = samples_path
+        self.metadata_path = metadata_path
+        self.metadata = metadata
+        self.dets_and_obs = torch.from_numpy(dets_and_obs).to(torch.uint8).contiguous()
+        self.num_samples = int(self.dets_and_obs.shape[0])
+        self._half = (D * D - 1) // 2
+
+        self._precompute_transformations_from_dets()
+
+    def _precompute_transformations_from_dets(self):
+        num_obs = self.circ.stim_circuit.num_observables
+        dets = self.dets_and_obs[:, :-num_obs].contiguous()
+        train_x, x_syn_diff, z_syn_diff = dets_to_predecoder_inputs(
+            dets,
+            distance=self.distance,
+            n_rounds=self.n_rounds,
+            basis=self.measure_basis,
+            code_rotation=self.code_rotation,
+        )
+        self.x_syn_diff_all = x_syn_diff
+        self.z_syn_diff_all = z_syn_diff
+        self.trainX_all = train_x
+
+    def __len__(self):
+        return self.num_samples
+
+    def __getitem__(self, idx):
+        return {
+            "x_syn_diff": self.x_syn_diff_all[idx],
+            "z_syn_diff": self.z_syn_diff_all[idx],
+            "trainX": self.trainX_all[idx],
+            "dets_and_obs": self.dets_and_obs[idx],
+        }
+
+
+__all__ = [
+    'QCDataPipePreDecoder_Memory_inference',
+    'QCDataPipePreDecoder_Memory_from_stim_file',
+]