Add cuStabilizer BitMatrixSampler integration to DEM sampling#24
Merged
Conversation
888abbd to
777ff4e
Compare
ivanbasov
reviewed
Mar 24, 2026
c2bf416 to
519febb
Compare
Replace the pure-torch dem_sampling with a version that transparently uses cuQuantum's BitMatrixSampler when available, falling back to the original torch path when cuST is not installed or USE_CUSTAB=0. - custab_matrix_sampling() with sampler caching and max_shots tracking - CuPy zero-copy DLPack GPU pipeline (torch -> cupy -> cuST -> torch) - Timing instrumentation (get_dem_sampling_avg_ms) for training logs - Input validation on H/p shapes - USE_CUSTAB env var toggle with reset helpers for testing - Vectorized measure_from_stacked_frames (kept from main) - New tests: test_dem_sampling_custab.py, test_dem_sampling_integration.py Signed-off-by: kvmto <kmato@nvidia.com>
791cd68 to
70e60f9
Compare
requirements_public_inference.txt:
- Document cupy-cudaXXX as an optional GPU-only prerequisite alongside
the existing tensorrt comment; explains the DLPack fallback behaviour.
tests/test_dem_sampling_custab.py:
- Add TestDEMSamplingCupyGPUPath (skipped unless custab + CuPy + CUDA
are all present) covering:
- _CUPY_AVAILABLE flag is set
- correct shape and uint8 dtype from the GPU-native path
- deterministic syndrome matches expected checks
- GPU/CuPy result matches torch CPU fallback on deterministic input
NOTICE:
- Add CuPy (MIT, Preferred Networks) entry
- Add TensorRT (Apache 2.0, NVIDIA) entry — was missing
- Add onnxscript (MIT, Microsoft) entry — was missing
- Add OmegaConf (BSD-3-Clause, Omry Yadan) entry — was missing
- Include full license text or reference for all new entries
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Ivan Basov <ibasov@nvidia.com>
70e60f9 to
85214f3
Compare
ivanbasov
approved these changes
Mar 24, 2026
ivanbasov
added a commit
that referenced
this pull request
Apr 10, 2026
* Add cuStabilizer BitMatrixSampler integration to DEM sampling
Replace the pure-torch dem_sampling with a version that transparently
uses cuQuantum's BitMatrixSampler when available, falling back to the
original torch path when cuST is not installed or USE_CUSTAB=0.
- custab_matrix_sampling() with sampler caching and max_shots tracking
- CuPy zero-copy DLPack GPU pipeline (torch -> cupy -> cuST -> torch)
- Timing instrumentation (get_dem_sampling_avg_ms) for training logs
- Input validation on H/p shapes
- USE_CUSTAB env var toggle with reset helpers for testing
- Vectorized measure_from_stacked_frames (kept from main)
- New tests: test_dem_sampling_custab.py, test_dem_sampling_integration.py
Signed-off-by: kvmto <kmato@nvidia.com>
* feat: add CuPy dependency, tests, and NOTICE entry
requirements_public_inference.txt:
- Document cupy-cudaXXX as an optional GPU-only prerequisite alongside
the existing tensorrt comment; explains the DLPack fallback behaviour.
tests/test_dem_sampling_custab.py:
- Add TestDEMSamplingCupyGPUPath (skipped unless custab + CuPy + CUDA
are all present) covering:
- _CUPY_AVAILABLE flag is set
- correct shape and uint8 dtype from the GPU-native path
- deterministic syndrome matches expected checks
- GPU/CuPy result matches torch CPU fallback on deterministic input
NOTICE:
- Add CuPy (MIT, Preferred Networks) entry
- Add TensorRT (Apache 2.0, NVIDIA) entry — was missing
- Add onnxscript (MIT, Microsoft) entry — was missing
- Add OmegaConf (BSD-3-Clause, Omry Yadan) entry — was missing
- Include full license text or reference for all new entries
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Ivan Basov <ibasov@nvidia.com>
---------
Signed-off-by: kvmto <kmato@nvidia.com>
Signed-off-by: Ivan Basov <ibasov@nvidia.com>
Co-authored-by: Ivan Basov <ibasov@nvidia.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
ivanbasov
added a commit
that referenced
this pull request
Apr 10, 2026
* Add cuStabilizer BitMatrixSampler integration to DEM sampling
Replace the pure-torch dem_sampling with a version that transparently
uses cuQuantum's BitMatrixSampler when available, falling back to the
original torch path when cuST is not installed or USE_CUSTAB=0.
- custab_matrix_sampling() with sampler caching and max_shots tracking
- CuPy zero-copy DLPack GPU pipeline (torch -> cupy -> cuST -> torch)
- Timing instrumentation (get_dem_sampling_avg_ms) for training logs
- Input validation on H/p shapes
- USE_CUSTAB env var toggle with reset helpers for testing
- Vectorized measure_from_stacked_frames (kept from main)
- New tests: test_dem_sampling_custab.py, test_dem_sampling_integration.py
Signed-off-by: kvmto <kmato@nvidia.com>
* feat: add CuPy dependency, tests, and NOTICE entry
requirements_public_inference.txt:
- Document cupy-cudaXXX as an optional GPU-only prerequisite alongside
the existing tensorrt comment; explains the DLPack fallback behaviour.
tests/test_dem_sampling_custab.py:
- Add TestDEMSamplingCupyGPUPath (skipped unless custab + CuPy + CUDA
are all present) covering:
- _CUPY_AVAILABLE flag is set
- correct shape and uint8 dtype from the GPU-native path
- deterministic syndrome matches expected checks
- GPU/CuPy result matches torch CPU fallback on deterministic input
NOTICE:
- Add CuPy (MIT, Preferred Networks) entry
- Add TensorRT (Apache 2.0, NVIDIA) entry — was missing
- Add onnxscript (MIT, Microsoft) entry — was missing
- Add OmegaConf (BSD-3-Clause, Omry Yadan) entry — was missing
- Include full license text or reference for all new entries
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Ivan Basov <ibasov@nvidia.com>
---------
Signed-off-by: kvmto <kmato@nvidia.com>
Signed-off-by: Ivan Basov <ibasov@nvidia.com>
Co-authored-by: Ivan Basov <ibasov@nvidia.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
dem_sampling()with a version that transparently uses cuQuantum'sBitMatrixSamplerwhen available, falling back to the original torch path when cuST is not installed orUSE_CUSTAB=0custab_matrix_sampling()with sampler caching,max_shotstracking, and CuPy zero-copy DLPack GPU pipeline (torch → CuPy → cuStabilizer → CuPy → torch)get_dem_sampling_avg_ms) for training logs, input validation on H/p shapes, andUSE_CUSTABenv var toggleFiles changed
code/qec/dem_sampling.py— core implementation (modified)code/tests/test_dem_sampling_custab.py— cuST-specific + torch fallback unit tests (new)code/tests/test_dem_sampling_integration.py— end-to-end pipeline test via MemoryCircuitTorch (new)Test plan
test_dem_sampling.pypasses (API contract preserved)test_dem_sampling_custab.pypasses (cuST path + torch fallback with deterministic p)test_dem_sampling_integration.pypasses (full precompute_dem → generate_batch pipeline)Using cuST BitMatrixSampler path (max_shots=1024, gpu_native=True)