Add Qwen3.5 FP8 and NVFP4#168
Conversation
|
@JustinTong0323 I can also add now nvfp4 here if that sounds good |
|
Hi @mmangkad I am contributing FP4 and FP8 configs to InferenceX here - SemiAnalysisAI/InferenceX#820 Have added a couple recommendations for best perf based on our experiments. Thanks! |
|
cc: @faradawn |
ffedc4f to
ef661ed
Compare
|
Hi @kedarpotdar-nv, thanks for your recommendations. I added Also there might still be changes here after sgl-project/sglang#19391 when spec v2 is already enabled. |
| - --fp4-gemm-backend | ||
| - flashinfer_cutlass |
There was a problem hiding this comment.
Why not just use the default value? flashinfer_cutlass is the current best due to an issue in flashinfer, but when the flashinfer issue is fixed, this will not be the best option.
…rdware requirements for BF16, FP8, and FP4 quantized models for clarity. Signed-off-by: Xinyuan Tong <justinning0323@outlook.com>
No description provided.