Skip to content

Add cuStabilizer BitMatrixSampler integration to DEM sampling#24

Merged
ivanbasov merged 2 commits into
NVIDIA:mainfrom
kvmto:custab-dem-sampling
Mar 24, 2026
Merged

Add cuStabilizer BitMatrixSampler integration to DEM sampling#24
ivanbasov merged 2 commits into
NVIDIA:mainfrom
kvmto:custab-dem-sampling

Conversation

@kvmto
Copy link
Copy Markdown
Collaborator

@kvmto kvmto commented Mar 24, 2026

Summary

  • Replaces the pure-torch dem_sampling() with a version that transparently uses cuQuantum's BitMatrixSampler when available, falling back to the original torch path when cuST is not installed or USE_CUSTAB=0
  • Adds custab_matrix_sampling() with sampler caching, max_shots tracking, and CuPy zero-copy DLPack GPU pipeline (torch → CuPy → cuStabilizer → CuPy → torch)
  • Adds timing instrumentation (get_dem_sampling_avg_ms) for training logs, input validation on H/p shapes, and USE_CUSTAB env var toggle

Files changed

  • code/qec/dem_sampling.py — core implementation (modified)
  • code/tests/test_dem_sampling_custab.py — cuST-specific + torch fallback unit tests (new)
  • code/tests/test_dem_sampling_integration.py — end-to-end pipeline test via MemoryCircuitTorch (new)

Test plan

  • Existing test_dem_sampling.py passes (API contract preserved)
  • New test_dem_sampling_custab.py passes (cuST path + torch fallback with deterministic p)
  • New test_dem_sampling_integration.py passes (full precompute_dem → generate_batch pipeline)
  • Smoke training run confirms cuST path activates: Using cuST BitMatrixSampler path (max_shots=1024, gpu_native=True)
  • DLPack zero-copy round-trip verified between torch and CuPy on GPU

@kvmto kvmto force-pushed the custab-dem-sampling branch from 888abbd to 777ff4e Compare March 24, 2026 16:45
Comment thread code/qec/dem_sampling.py
@ivanbasov ivanbasov force-pushed the custab-dem-sampling branch from c2bf416 to 519febb Compare March 24, 2026 22:24
Replace the pure-torch dem_sampling with a version that transparently
uses cuQuantum's BitMatrixSampler when available, falling back to the
original torch path when cuST is not installed or USE_CUSTAB=0.

- custab_matrix_sampling() with sampler caching and max_shots tracking
- CuPy zero-copy DLPack GPU pipeline (torch -> cupy -> cuST -> torch)
- Timing instrumentation (get_dem_sampling_avg_ms) for training logs
- Input validation on H/p shapes
- USE_CUSTAB env var toggle with reset helpers for testing
- Vectorized measure_from_stacked_frames (kept from main)
- New tests: test_dem_sampling_custab.py, test_dem_sampling_integration.py

Signed-off-by: kvmto <kmato@nvidia.com>
@ivanbasov ivanbasov force-pushed the custab-dem-sampling branch 2 times, most recently from 791cd68 to 70e60f9 Compare March 24, 2026 22:29
requirements_public_inference.txt:
- Document cupy-cudaXXX as an optional GPU-only prerequisite alongside
  the existing tensorrt comment; explains the DLPack fallback behaviour.

tests/test_dem_sampling_custab.py:
- Add TestDEMSamplingCupyGPUPath (skipped unless custab + CuPy + CUDA
  are all present) covering:
    - _CUPY_AVAILABLE flag is set
    - correct shape and uint8 dtype from the GPU-native path
    - deterministic syndrome matches expected checks
    - GPU/CuPy result matches torch CPU fallback on deterministic input

NOTICE:
- Add CuPy (MIT, Preferred Networks) entry
- Add TensorRT (Apache 2.0, NVIDIA) entry — was missing
- Add onnxscript (MIT, Microsoft) entry — was missing
- Add OmegaConf (BSD-3-Clause, Omry Yadan) entry — was missing
- Include full license text or reference for all new entries

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Ivan Basov <ibasov@nvidia.com>
@ivanbasov ivanbasov force-pushed the custab-dem-sampling branch from 70e60f9 to 85214f3 Compare March 24, 2026 22:32
@ivanbasov ivanbasov self-requested a review March 24, 2026 22:33
@ivanbasov ivanbasov merged commit a250ca1 into NVIDIA:main Mar 24, 2026
12 checks passed
@ivanbasov ivanbasov deleted the custab-dem-sampling branch March 24, 2026 22:45
ivanbasov added a commit that referenced this pull request Apr 10, 2026
* Add cuStabilizer BitMatrixSampler integration to DEM sampling

Replace the pure-torch dem_sampling with a version that transparently
uses cuQuantum's BitMatrixSampler when available, falling back to the
original torch path when cuST is not installed or USE_CUSTAB=0.

- custab_matrix_sampling() with sampler caching and max_shots tracking
- CuPy zero-copy DLPack GPU pipeline (torch -> cupy -> cuST -> torch)
- Timing instrumentation (get_dem_sampling_avg_ms) for training logs
- Input validation on H/p shapes
- USE_CUSTAB env var toggle with reset helpers for testing
- Vectorized measure_from_stacked_frames (kept from main)
- New tests: test_dem_sampling_custab.py, test_dem_sampling_integration.py

Signed-off-by: kvmto <kmato@nvidia.com>

* feat: add CuPy dependency, tests, and NOTICE entry

requirements_public_inference.txt:
- Document cupy-cudaXXX as an optional GPU-only prerequisite alongside
  the existing tensorrt comment; explains the DLPack fallback behaviour.

tests/test_dem_sampling_custab.py:
- Add TestDEMSamplingCupyGPUPath (skipped unless custab + CuPy + CUDA
  are all present) covering:
    - _CUPY_AVAILABLE flag is set
    - correct shape and uint8 dtype from the GPU-native path
    - deterministic syndrome matches expected checks
    - GPU/CuPy result matches torch CPU fallback on deterministic input

NOTICE:
- Add CuPy (MIT, Preferred Networks) entry
- Add TensorRT (Apache 2.0, NVIDIA) entry — was missing
- Add onnxscript (MIT, Microsoft) entry — was missing
- Add OmegaConf (BSD-3-Clause, Omry Yadan) entry — was missing
- Include full license text or reference for all new entries

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Ivan Basov <ibasov@nvidia.com>

---------

Signed-off-by: kvmto <kmato@nvidia.com>
Signed-off-by: Ivan Basov <ibasov@nvidia.com>
Co-authored-by: Ivan Basov <ibasov@nvidia.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
ivanbasov added a commit that referenced this pull request Apr 10, 2026
* Add cuStabilizer BitMatrixSampler integration to DEM sampling

Replace the pure-torch dem_sampling with a version that transparently
uses cuQuantum's BitMatrixSampler when available, falling back to the
original torch path when cuST is not installed or USE_CUSTAB=0.

- custab_matrix_sampling() with sampler caching and max_shots tracking
- CuPy zero-copy DLPack GPU pipeline (torch -> cupy -> cuST -> torch)
- Timing instrumentation (get_dem_sampling_avg_ms) for training logs
- Input validation on H/p shapes
- USE_CUSTAB env var toggle with reset helpers for testing
- Vectorized measure_from_stacked_frames (kept from main)
- New tests: test_dem_sampling_custab.py, test_dem_sampling_integration.py

Signed-off-by: kvmto <kmato@nvidia.com>

* feat: add CuPy dependency, tests, and NOTICE entry

requirements_public_inference.txt:
- Document cupy-cudaXXX as an optional GPU-only prerequisite alongside
  the existing tensorrt comment; explains the DLPack fallback behaviour.

tests/test_dem_sampling_custab.py:
- Add TestDEMSamplingCupyGPUPath (skipped unless custab + CuPy + CUDA
  are all present) covering:
    - _CUPY_AVAILABLE flag is set
    - correct shape and uint8 dtype from the GPU-native path
    - deterministic syndrome matches expected checks
    - GPU/CuPy result matches torch CPU fallback on deterministic input

NOTICE:
- Add CuPy (MIT, Preferred Networks) entry
- Add TensorRT (Apache 2.0, NVIDIA) entry — was missing
- Add onnxscript (MIT, Microsoft) entry — was missing
- Add OmegaConf (BSD-3-Clause, Omry Yadan) entry — was missing
- Include full license text or reference for all new entries

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Ivan Basov <ibasov@nvidia.com>

---------

Signed-off-by: kvmto <kmato@nvidia.com>
Signed-off-by: Ivan Basov <ibasov@nvidia.com>
Co-authored-by: Ivan Basov <ibasov@nvidia.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants