Fix export of fp8 ONNX files by bmhowe23 · Pull Request #52 · NVIDIA/Ising-Decoding

bmhowe23 · 2026-04-08T04:10:27Z

Without this change:

$ ONNX_WORKFLOW=2 QUANT_FORMAT=fp8 DISTANCE=13 N_ROUNDS=104 PREDECODER_INFERENCE_NUM_SAMPLES=2048 WORKFLOW=inference EXPERIMENT_NAME=predecoder_model_1 bash code/scripts/local_run.sh
...
[LER] ONNX export failed: [LER] FP8 ONNX quantization failed (fail-fast): [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Unexpected input data type. Actual: (tensor(float)) , expected: (tensor(uint8)); falling back to PyTorch.
...

With this change, no such error is produced. Now we can properly export fp8 ONNX files.

Signed-off-by: Ben Howe <bhowe@nvidia.com>

`_collect_calibration_dets` returns uint8; casting to float32 before passing to mq.quantize triggered an INVALID_ARGUMENT error from the ONNX runtime ("expected: tensor(uint8), got: tensor(float)"). The new test mirrors the existing int8 variant and asserts that the fp8 path preserves the original uint8 dtype and forwards the FP8-specific kwargs (op_types_to_quantize, high_precision_dtype). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

IgorBaratta

LGTM

* Fix export of fp8 ONNX files Signed-off-by: Ben Howe <bhowe@nvidia.com> * test: add fp8 calibration dtype regression test for #52 `_collect_calibration_dets` returns uint8; casting to float32 before passing to mq.quantize triggered an INVALID_ARGUMENT error from the ONNX runtime ("expected: tensor(uint8), got: tensor(float)"). The new test mirrors the existing int8 variant and asserts that the fp8 path preserves the original uint8 dtype and forwards the FP8-specific kwargs (op_types_to_quantize, high_precision_dtype). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Signed-off-by: Ben Howe <bhowe@nvidia.com> Co-authored-by: Ivan Basov <ibasov@nvidia.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

bmhowe23 and others added 2 commits April 7, 2026 21:07

Fix export of fp8 ONNX files

8ee3299

Signed-off-by: Ben Howe <bhowe@nvidia.com>

IgorBaratta approved these changes Apr 8, 2026

View reviewed changes

ivanbasov merged commit 2cbb707 into main Apr 8, 2026
17 checks passed

ivanbasov deleted the pr-fix-fp8-export branch April 8, 2026 15:56

This was referenced Apr 8, 2026

fix(trt): replace STRONGLY_TYPED with BuilderFlag.FP8/INT8 to restore throughput #56

Closed

chore: fast-forward releases/v0.1.0 to main #58

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix export of fp8 ONNX files#52

Fix export of fp8 ONNX files#52
ivanbasov merged 2 commits into
mainfrom
pr-fix-fp8-export

bmhowe23 commented Apr 8, 2026

Uh oh!

IgorBaratta left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

bmhowe23 commented Apr 8, 2026

Uh oh!

IgorBaratta left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants