Skip to content

Add Qwen3.5 FP8 and NVFP4#168

Merged
JustinTong0323 merged 5 commits intosgl-project:mainfrom
mmangkad:add-qwen35-fp8
Mar 2, 2026
Merged

Add Qwen3.5 FP8 and NVFP4#168
JustinTong0323 merged 5 commits intosgl-project:mainfrom
mmangkad:add-qwen35-fp8

Conversation

@mmangkad
Copy link
Copy Markdown
Contributor

No description provided.

@mmangkad
Copy link
Copy Markdown
Contributor Author

cc @JustinTong0323

Comment thread src/components/autoregressive/Qwen35ConfigGenerator/index.js Outdated
Comment thread package-lock.json
@mmangkad
Copy link
Copy Markdown
Contributor Author

@JustinTong0323 I can also add now nvfp4 here if that sounds good

@mmangkad mmangkad changed the title Add Qwen3.5-397B-A17B-FP8 Add Qwen3.5 FP8 and NVFP4 Feb 24, 2026
Comment thread data/models/src/v0.5.8/qwen35.yaml
@kedarpotdar-nv
Copy link
Copy Markdown
Contributor

Hi @mmangkad I am contributing FP4 and FP8 configs to InferenceX here - SemiAnalysisAI/InferenceX#820

Have added a couple recommendations for best perf based on our experiments. Thanks!

@kedarpotdar-nv
Copy link
Copy Markdown
Contributor

cc: @faradawn

@mmangkad
Copy link
Copy Markdown
Contributor Author

Hi @kedarpotdar-nv, thanks for your recommendations. I added --enable-flashinfer-allreduce-fusion to all configs as it should be good for SM90+ and tp > 2. I also added --kv-cache-dtype fp8_e4m3 for FP4 configs now, but this should be automatically resolved already in my PR sgl-project/sglang#19291.

Also there might still be changes here after sgl-project/sglang#19391 when spec v2 is already enabled.

Comment on lines +72 to +73
- --fp4-gemm-backend
- flashinfer_cutlass
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just use the default value? flashinfer_cutlass is the current best due to an issue in flashinfer, but when the flashinfer issue is fixed, this will not be the best option.

…rdware requirements for BF16, FP8, and FP4 quantized models for clarity.

Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
@JustinTong0323 JustinTong0323 merged commit 6197d8f into sgl-project:main Mar 2, 2026
@mmangkad mmangkad deleted the add-qwen35-fp8 branch March 3, 2026 05:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants