You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
test(onnx): add end-to-end mq.quantize() tests for modelopt
Previous coverage only verified that modelopt.onnx.quantization was
importable. Add TestModeloptQuantize with two tests that actually call
mq.quantize() on a real ONNX model:
- test_mq_quantize_int8_produces_valid_onnx: verifies the output file is
created and passes onnx.checker (confirms modelopt works at runtime,
not just at import time — this is the key Python 3.13 regression check)
- test_mq_quantize_int8_output_differs_from_fp32: verifies QDQ nodes were
inserted (output graph has more nodes than the FP32 source)
Both tests share a _build_tiny_model() helper that creates a minimal
Gemm ONNX model with input "dets" and 16 calibration rows, matching the
production calibration_data={"dets": calib_dets} call convention.
model.ir_version is pinned to 8 for onnxruntime-gpu 1.22.0 compatibility.
Tests are skipped when nvidia-modelopt is not installed.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
0 commit comments