Add Qwen3.5 FP8 and NVFP4 by mmangkad · Pull Request #168 · sgl-project/sgl-cookbook

mmangkad · 2026-02-18T18:54:41Z

No description provided.

mmangkad · 2026-02-19T05:37:36Z

cc @JustinTong0323

mmangkad · 2026-02-24T17:09:03Z

@JustinTong0323 I can also add now nvfp4 here if that sounds good

kedarpotdar-nv · 2026-02-27T21:59:57Z

Hi @mmangkad I am contributing FP4 and FP8 configs to InferenceX here - SemiAnalysisAI/InferenceX#820

Have added a couple recommendations for best perf based on our experiments. Thanks!

kedarpotdar-nv · 2026-02-27T22:00:09Z

cc: @faradawn

mmangkad · 2026-02-28T11:29:08Z

Hi @kedarpotdar-nv, thanks for your recommendations. I added --enable-flashinfer-allreduce-fusion to all configs as it should be good for SM90+ and tp > 2. I also added --kv-cache-dtype fp8_e4m3 for FP4 configs now, but this should be automatically resolved already in my PR sgl-project/sglang#19291.

Also there might still be changes here after sgl-project/sglang#19391 when spec v2 is already enabled.

hlu1 · 2026-03-02T08:42:27Z

+              - --fp4-gemm-backend
+              - flashinfer_cutlass


Why not just use the default value? flashinfer_cutlass is the current best due to an issue in flashinfer, but when the flashinfer issue is fixed, this will not be the best option.

…rdware requirements for BF16, FP8, and FP4 quantized models for clarity. Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>

Add Qwen3.5-397B-A17B-FP8

81cd5e0

Richardczl98 requested a review from JustinTong0323 February 22, 2026 23:51

JustinTong0323 reviewed Feb 24, 2026

View reviewed changes

Comment thread src/components/autoregressive/Qwen35ConfigGenerator/index.js Outdated

JustinTong0323 reviewed Feb 24, 2026

View reviewed changes

Comment thread package-lock.json

upd

4fdb29d

add nvfp4

f1c2cd4

mmangkad changed the title ~~Add Qwen3.5-397B-A17B-FP8~~ Add Qwen3.5 FP8 and NVFP4 Feb 24, 2026

mmangkad requested a review from JustinTong0323 February 24, 2026 17:52

kedarpotdar-nv reviewed Feb 27, 2026

View reviewed changes

Comment thread data/models/src/v0.5.8/qwen35.yaml

kedarpotdar-nv mentioned this pull request Feb 27, 2026

[NVIDIA] Qwen3.5 B200 SGLang FP4 configs SemiAnalysisAI/InferenceX#820

Merged

upd

ef661ed

mmangkad force-pushed the add-qwen35-fp8 branch from ffedc4f to ef661ed Compare February 28, 2026 11:21

hlu1 reviewed Mar 2, 2026

View reviewed changes

Update Qwen3.5 documentation to mark it as complete and reorganize ha…

593a0f8

…rdware requirements for BF16, FP8, and FP4 quantized models for clarity. Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>

JustinTong0323 approved these changes Mar 2, 2026

View reviewed changes

JustinTong0323 merged commit 6197d8f into sgl-project:main Mar 2, 2026

mmangkad deleted the add-qwen35-fp8 branch March 3, 2026 05:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Qwen3.5 FP8 and NVFP4#168

Add Qwen3.5 FP8 and NVFP4#168
JustinTong0323 merged 5 commits intosgl-project:mainfrom
mmangkad:add-qwen35-fp8

mmangkad commented Feb 18, 2026

Uh oh!

mmangkad commented Feb 19, 2026

Uh oh!

Uh oh!

Uh oh!

mmangkad commented Feb 24, 2026

Uh oh!

Uh oh!

kedarpotdar-nv commented Feb 27, 2026

Uh oh!

kedarpotdar-nv commented Feb 27, 2026

Uh oh!

mmangkad commented Feb 28, 2026

Uh oh!

hlu1 Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

mmangkad commented Feb 18, 2026

Uh oh!

mmangkad commented Feb 19, 2026

Uh oh!

Uh oh!

Uh oh!

mmangkad commented Feb 24, 2026

Uh oh!

Uh oh!

kedarpotdar-nv commented Feb 27, 2026

Uh oh!

kedarpotdar-nv commented Feb 27, 2026

Uh oh!

mmangkad commented Feb 28, 2026

Uh oh!

hlu1 Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants