Skip to content

fix(tensorrt): TRT 10.16.1.11 + modelopt install + run_pip quote-fix#1

Open
forkni wants to merge 1 commit intodotsimulate:mainfrom
forkni:main
Open

fix(tensorrt): TRT 10.16.1.11 + modelopt install + run_pip quote-fix#1
forkni wants to merge 1 commit intodotsimulate:mainfrom
forkni:main

Conversation

@forkni
Copy link
Copy Markdown

@forkni forkni commented Apr 26, 2026

Summary

Bumps the installer to align with TRT 10.16.1.11 (first Blackwell-Windows-production release; fixes the 78% FP8 perf regression in 10.12–10.13 on SM_120) and adds the missing FP8-quant install block.

  • sd_installer/tensorrt.py: bump tensorrt_cu1210.16.1.11, polygraphy0.49.26, onnx-graphsurgeon0.6.1; add FP8-quant block (nvidia-modelopt[onnx] cupy-cuda12x==13.6.0 numpy==1.26.4) previously missing — silent ImportError on fp8_quantize until first FP8 build. Re-pin onnxruntime-gpu==1.24.4 with --no-deps after modelopt's transitive downgrade. Drop shell-style quotes inside package specs (run_pip uses subprocess + .split(), so quotes become literal arg chars).
  • sd_installer/installer.py: remove torchaudio from cu128 config (not needed); minor ruff format cleanup.
  • sd_installer/verifier.py: float32_to_bfloat16 diagnostic now points to onnx-graphsurgeon==0.6.1 instead of suggesting an onnx downgrade.
  • sd_installer/{cli.py, __init__.py, __main__.py}: ruff format cleanup (blank lines, unused import, raw docstring).

Companion PR

Pairs with dotsimulate/StreamDiffusion#12 — the main library work for TRT 10.16.1.11 + FP8 quantization. The installer fix here is a strict prerequisite: the StreamDiffusionTD COMP's Installtensorrt button installs from this repo's sd_installer/tensorrt.py, so without this PR merged the button continues to install TRT 10.12 even after the main PR lands.

Test Plan

  • Fresh-venv install: confirm pip list reports tensorrt_cu12==10.16.1.11, polygraphy==0.49.26, onnx-graphsurgeon==0.6.1, nvidia-modelopt>=0.19, onnxruntime-gpu==1.24.4 (--no-deps re-pin).
  • python -c "from streamdiffusion.acceleration.tensorrt.fp8_quantize import *; print('OK')" returns OK on a fresh install (pre-fix this would have ImportError'd on modelopt until the first FP8 build).
  • All 13 verifier checks pass.

🤖 Generated with Claude Code

- tensorrt.py: bump tensorrt_cu12 to 10.16.1.11, polygraphy 0.49.26,
  onnx-graphsurgeon 0.6.1; add FP8-quant block (modelopt + cupy-cuda12x
  + numpy re-lock); re-pin onnxruntime-gpu==1.24.4 with --no-deps after
  modelopt downgrade; drop shell-style quotes inside package specs
  (run_pip uses subprocess + .split(), quotes become literal arg chars).
- installer.py: remove torchaudio from cu128 config (not needed);
  minor ruff format cleanup.
- verifier.py: float32_to_bfloat16 diagnostic points to onnx-gs 0.6.1
  instead of suggesting an onnx downgrade.
- __init__.py, __main__.py, cli.py: ruff format cleanup (blank lines,
  unused import, raw docstring).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants