chore: fast-forward releases/v0.1.0 to main#58
Merged
Conversation
…ility (#43) * fix(ci): disable torch.compile in orientation training to prevent segfault torch.compile=on combined with DataLoader spawn workers during LER validation causes a segfault (20 leaked semaphores, core dumped). Set PREDECODER_TORCH_COMPILE=0 for the Train all orientations step. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Revert "fix(ci): disable torch.compile in orientation training to prevent segfault" This reverts commit 7f0f6c8. * fix(mid): seed BitMatrixSampler explicitly to restore test reproducibility torch.manual_seed() does not control cuQuantum's BitMatrixSampler internal RNG, so the two mid-GPU tests that relied on it for reproducibility were non-deterministic and intermittently failing. Add an optional `seed` parameter to `dem_sampling()` and `MemoryCircuitTorch.generate_batch()`. When a seed is provided a fresh BitMatrixSampler is always created with `Options(seed=N)`, resetting its internal RNG and guaranteeing identical outputs on every call with the same seed. Production paths (seed=None) are unaffected — the cached sampler is reused as before. Update the two failing tests to use the explicit seed kwarg instead of torch.manual_seed(): - test_he_reduces_error_weight: seed=123 - test_full_pipeline_w2_reproducible: seed=100 Fixes: NVIDIA/Ising-Decoding CI run 23963347042 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * style: fix yapf line-break position in need_new condition Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test: add dem_sampling reproducibility tests for seed= parameter Add TestDEMSamplingReproducibility to test_dem_sampling.py with four cases: - same seed on CPU produces bit-exact identical frames - different seeds produce different frames - unseeded calls still reuse the cached sampler (perf regression guard) - same seed on GPU produces bit-exact identical frames (GPU-only) These tests use stochastic p values (0.1–0.9) so they would have caught the original regression: before the seed= fix, BitMatrixSampler's internal RNG was not reset between calls, making "same seed" reproducibility impossible regardless of torch.manual_seed(). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: use torch.Generator for seeded path; BitMatrixSampler RNG is not seedable Options.__init__() does not accept a 'seed' keyword — the cuST BitMatrixSampler's internal RNG is not exposed via the public API. Replace the attempted Options(seed=N) approach with a small pure-torch fallback (_torch_dem_sampling) that uses a local torch.Generator seeded to the requested value. This path is only taken when seed= is explicitly passed (tests); the production BitMatrixSampler cache path is unchanged. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: pass seed directly to BitMatrixSampler constructor BitMatrixSampler accepts seed as a constructor kwarg (not via Options). Replace the torch fallback workaround with the correct cuST API: pass seed= directly to BitMatrixSampler(..., seed=seed). A fresh sampler is created on every seeded call so its internal RNG is reset to the requested seed, guaranteeing identical outputs on repeated calls with the same value. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* DISTANCE and N_ROUNDS updates Signed-off-by: Ben Howe <bhowe@nvidia.com> * Formatting updates Signed-off-by: Ben Howe <bhowe@nvidia.com> * Revert "Formatting updates" This reverts commit 757f378. --------- Signed-off-by: Ben Howe <bhowe@nvidia.com>
add B200, H200 remove A100
reformat title and header, product positioning
* adding decode_batch path in failure_analysis and vectorizing observable projection Signed-off-by: Sachin Pisal <spisal@nvidia.com> * pass syndromes as list-of-lists to cudaq decode_batch Signed-off-by: Sachin Pisal <spisal@nvidia.com> * implementing feedback Signed-off-by: Sachin Pisal <spisal@nvidia.com> --------- Signed-off-by: Sachin Pisal <spisal@nvidia.com>
…ccurate} (#51) * fix(ci): disable torch.compile in orientation training to prevent segfault torch.compile=on combined with DataLoader spawn workers during LER validation causes a segfault (20 leaked semaphores, core dumped). Set PREDECODER_TORCH_COMPILE=0 for the Train all orientations step. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Revert "fix(ci): disable torch.compile in orientation training to prevent segfault" This reverts commit 7f0f6c8. * feat: rename pretrained models to Ising-Decoder-SurfaceCode-1-{Fast,Accurate} - Rename PreDecoderModelMemory_r9_v1.0.77.pt → Ising-Decoder-SurfaceCode-1-Fast.pt - Rename PreDecoderModelMemory_r13_v1.0.86.pt → Ising-Decoder-SurfaceCode-1-Accurate.pt - Models remain Git LFS-tracked via models/*.pt (no storage change) - Add model_checkpoint_file direct-path option to _load_model so named pretrained files (without epoch numbers) can be loaded without directory scanning - Update test_inference_public_model.py, README, and checkpoint_to_safetensors.py Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* Update config_qec_decoder_r9_fp8.yaml change model 1 to model name * Update conf/config_qec_decoder_r9_fp8.yaml --------- Co-authored-by: Ben Howe <141149032+bmhowe23@users.noreply.github.com>
* Update config_qec_decoder_r13_fp8.yaml refer to model 4 as Ising-Decoder-SurfaceCode-1-Accurate * Update conf/config_qec_decoder_r13_fp8.yaml --------- Co-authored-by: Ben Howe <141149032+bmhowe23@users.noreply.github.com>
* Fix export of fp8 ONNX files Signed-off-by: Ben Howe <bhowe@nvidia.com> * test: add fp8 calibration dtype regression test for #52 `_collect_calibration_dets` returns uint8; casting to float32 before passing to mq.quantize triggered an INVALID_ARGUMENT error from the ONNX runtime ("expected: tensor(uint8), got: tensor(float)"). The new test mirrors the existing int8 variant and asserts that the fp8 path preserves the original uint8 dtype and forwards the FP8-specific kwargs (op_types_to_quantize, high_precision_dtype). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Signed-off-by: Ben Howe <bhowe@nvidia.com> Co-authored-by: Ivan Basov <ibasov@nvidia.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: find_best_model now accepts named .pt files without epoch numbers
The old code required filenames to start with PreDecoderModelMemory_ and
encode an epoch number. After the model rename to Ising-Decoder-SurfaceCode-1-
{Fast,Accurate}.pt, copying one of these files into the models dir and running
inference via local_run.sh would fail with "Found 0 model files".
Fall back to any .pt file (sorted, last wins) when no epoch-numbered
PreDecoderModelMemory_ checkpoints are found in the directory.
Fixes regression reported in #51
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* style: fix yapf formatting in find_best_model
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix: suppress double measurement noise injection in MemoryCircuit final data qubit measurement _add_stabilizer_round(logical_measurement=True) correctly injects time-reversed measurement noise then restores self.noise_model before returning. The subsequent add_measure() call sees noise_model is not None and injects the same p_meas noise a second time, creating phantom error channels in the DEM. Temporarily clear self.noise_model around the final add_measure() call, matching the pattern already used inside _add_stabilizer_round itself. * Update code/qec/surface_code/memory_circuit.py * fix: suppress double measurement noise injection in MemoryCircuit + tests Fixes double p_meas injection on data qubits in MemoryCircuit.__init__. _add_stabilizer_round(logical_measurement=True) injects the time-reversed "fake SPAM" error and then restores self.noise_model before returning. The subsequent add_measure(data_qubits) at the call site saw a non-None noise_model and injected the same p_meas channel a second time, creating phantom DEM error channels (7/21/43 extra entries at d=3/5/7) that distorted PyMatching's matching graph and biased LER estimates. Fix: temporarily suppress self.noise_model around add_measure(data_qubits), matching the pattern already used inside _add_stabilizer_round itself. Also adds: - Regression test in TestNoiseModel verifying exactly one measurement-error injection appears in the post-REPEAT circuit section (not two). - Updates to TestLERComparison in test_boundary_detectors.py: replaces strict ler_with_bd < ler_no_bd assertions with a 1.5x tolerance check. The strict assertions were accidentally passing because phantom DEM entries were artificially inflating no-BD LER; the true BD improvement is a marginal 1-3% effect below the statistical resolution of 10-20k samples. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: remove duplicate data-qubit measurement + yapf formatting - Remove duplicate orig_noise_model/add_measure/restore block introduced during cherry-pick conflict resolution (caused non-deterministic detectors) - Collapse assertLessEqual arguments onto single line for yapf compliance Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Ivan Basov <5455484+ivanbasov@users.noreply.github.com> Co-authored-by: Ivan Basov <ibasov@nvidia.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* Add standalone script to generate evaluation test data Builds a Stim memory circuit with the 25-parameter noise model, samples syndrome data, extracts the DEM check matrices (H, O, priors) via beliefmatching, and runs a baseline PyMatching decode. Outputs are saved in a custom binary format for downstream pre-decoder benchmarking. Signed-off-by: Scott Thornton <wsttiger@gmail.com> * Formatting Signed-off-by: Scott Thornton <wsttiger@gmail.com> --------- Signed-off-by: Scott Thornton <wsttiger@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fast-forward merge of
mainintoreleases/v0.1.0to pick up post-QA commits without cherry-picking.Commits being added:
Since
releases/v0.1.0is a direct ancestor ofmain, merging with Rebase and merge will preserve the original commit SHAs.