Skip to content

[Test] Add operator-level determinism matrix for TORCH_DETERMINISTIC#37

Open
Young-Leo wants to merge 1 commit into
SandAI-org:mainfrom
Young-Leo:ly/deterministic-output
Open

[Test] Add operator-level determinism matrix for TORCH_DETERMINISTIC#37
Young-Leo wants to merge 1 commit into
SandAI-org:mainfrom
Young-Leo:ly/deterministic-output

Conversation

@Young-Leo

Copy link
Copy Markdown

What

Add tests/feature_tests/test_torch_op_determinism_matrix.py: an
operator-level determinism matrix verifying that high-risk torch operators
relevant to LLM and video/diffusion models are bitwise-reproducible under
torch.use_deterministic_algorithms(True).

Determinism is compositional, so the operator layer is validated directly
rather than per model.

How

For each operator:

  • Contract (asserted) — with the flag on (+ cuDNN deterministic, cuBLAS
    workspace, fixed seed), N runs on identical inputs must be either bitwise
    identical or raise a RuntimeError (no deterministic implementation). Silent
    drift fails the test. Ops with no deterministic kernel are re-run under
    warn_only=True to report fallback reproducibility (measured, not skipped).
  • Probe (report-only) — the same op with the flag off, recording which
    operators the flag actually rescues (hardware/build dependent, no assert).

Operators covered

scatter / index_add / scatter_reduce / index_put · embedding & cross_entropy
backward · interpolate / grid_sample backward · SDPA fwd+bwd · matmul / cumsum
/ sort / topk references.

Result (NVIDIA H100, PyTorch 2.9) — 28 passed

Category Operator Flag ON (contract) Flag OFF (probe)
fwd-atomic index_add identical drift
fwd-atomic scatter_add identical drift
fwd-atomic scatter_reduce_sum identical drift
fwd-atomic index_put_accumulate identical same
bwd-atomic embedding_backward identical same
bwd-atomic cross_entropy_backward identical same
bwd-atomic interpolate_bilinear_backward identical same
bwd-atomic grid_sample_backward no deterministic impl (raises); fallback non-reproducible drift
attention sdpa_forward identical same
attention sdpa_backward identical drift
reference matmul identical same
reference cumsum identical same
reference sort_indices identical same
reference topk_indices identical same

The flag rescues index_add, scatter_add, scatter_reduce_sum, and
sdpa_backward (drift → identical). grid_sample_backward has no deterministic
CUDA implementation: it raises under strict mode and is non-reproducible on
fallback. All other operators are reproducible regardless of the flag.

Scope

Eager operators only. Nondeterminism from the compiled path (fused
Inductor/Triton kernels) is out of scope for this layer.

Verify that high-risk torch operators relevant to LLM and video/diffusion
models are bitwise-reproducible under use_deterministic_algorithms(True),
exploiting the compositionality of determinism to validate at the operator
layer rather than per model.

For each operator the contract test asserts that, with the flag on, the
result over N runs on identical inputs is either bitwise identical or a
RuntimeError (no deterministic implementation) -- never silent drift; ops
without a deterministic kernel are additionally re-run under warn_only to
report fallback reproducibility. A report-only probe records which operators
the flag rescues with the flag off.

Cases: scatter/index_add/scatter_reduce/index_put, embedding and
cross_entropy backward, interpolate/grid_sample backward, SDPA fwd/bwd, and
matmul/cumsum/sort/topk references.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant