fix(tensorrt): TRT 10.16.1.11 + modelopt install + run_pip quote-fix#1
Open
forkni wants to merge 1 commit intodotsimulate:mainfrom
Open
fix(tensorrt): TRT 10.16.1.11 + modelopt install + run_pip quote-fix#1forkni wants to merge 1 commit intodotsimulate:mainfrom
forkni wants to merge 1 commit intodotsimulate:mainfrom
Conversation
- tensorrt.py: bump tensorrt_cu12 to 10.16.1.11, polygraphy 0.49.26, onnx-graphsurgeon 0.6.1; add FP8-quant block (modelopt + cupy-cuda12x + numpy re-lock); re-pin onnxruntime-gpu==1.24.4 with --no-deps after modelopt downgrade; drop shell-style quotes inside package specs (run_pip uses subprocess + .split(), quotes become literal arg chars). - installer.py: remove torchaudio from cu128 config (not needed); minor ruff format cleanup. - verifier.py: float32_to_bfloat16 diagnostic points to onnx-gs 0.6.1 instead of suggesting an onnx downgrade. - __init__.py, __main__.py, cli.py: ruff format cleanup (blank lines, unused import, raw docstring).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Bumps the installer to align with TRT 10.16.1.11 (first Blackwell-Windows-production release; fixes the 78% FP8 perf regression in 10.12–10.13 on SM_120) and adds the missing FP8-quant install block.
sd_installer/tensorrt.py: bumptensorrt_cu12→10.16.1.11,polygraphy→0.49.26,onnx-graphsurgeon→0.6.1; add FP8-quant block (nvidia-modelopt[onnx] cupy-cuda12x==13.6.0 numpy==1.26.4) previously missing — silentImportErroronfp8_quantizeuntil first FP8 build. Re-pinonnxruntime-gpu==1.24.4with--no-depsafter modelopt's transitive downgrade. Drop shell-style quotes inside package specs (run_pipuses subprocess +.split(), so quotes become literal arg chars).sd_installer/installer.py: removetorchaudiofromcu128config (not needed); minor ruff format cleanup.sd_installer/verifier.py:float32_to_bfloat16diagnostic now points toonnx-graphsurgeon==0.6.1instead of suggesting anonnxdowngrade.sd_installer/{cli.py, __init__.py, __main__.py}: ruff format cleanup (blank lines, unused import, raw docstring).Companion PR
Pairs with dotsimulate/StreamDiffusion#12 — the main library work for TRT 10.16.1.11 + FP8 quantization. The installer fix here is a strict prerequisite: the StreamDiffusionTD COMP's
Installtensorrtbutton installs from this repo'ssd_installer/tensorrt.py, so without this PR merged the button continues to install TRT 10.12 even after the main PR lands.Test Plan
pip listreportstensorrt_cu12==10.16.1.11,polygraphy==0.49.26,onnx-graphsurgeon==0.6.1,nvidia-modelopt>=0.19,onnxruntime-gpu==1.24.4(--no-depsre-pin).python -c "from streamdiffusion.acceleration.tensorrt.fp8_quantize import *; print('OK')"returnsOKon a fresh install (pre-fix this would haveImportError'd onmodeloptuntil the first FP8 build).🤖 Generated with Claude Code