Skip to content

Upgrade: Bump sglang from 0.4.6.post5 to 0.5.14#20

Open
dependabot[bot] wants to merge 1 commit into
mainfrom
dependabot/pip/sglang-0.5.14
Open

Upgrade: Bump sglang from 0.4.6.post5 to 0.5.14#20
dependabot[bot] wants to merge 1 commit into
mainfrom
dependabot/pip/sglang-0.5.14

Conversation

@dependabot

@dependabot dependabot Bot commented on behalf of github Jun 29, 2026

Copy link
Copy Markdown

Bumps sglang from 0.4.6.post5 to 0.5.14.

Release notes

Sourced from sglang's releases.

v0.5.14

Highlights

New Model Support: GLM-5.2, LiquidAI LFM2.5, Kimi-K2.7-Code, Poolside Laguna-M.1, DiffusionGemma, Zyphra ZAYA1, MiMo-V2-ASR

DeepSeek-V4 on GB300 since Day 0: 5x higher throughput at the same interactivity, serving DeepSeek-V4 on NVIDIA GB300 with SGLang (blog).

Waterfill & LPLB MoE load balancing: Two dispatch-time load-balancing methods for DeepEP expert parallelism: Waterfill for shared-expert dispatch and LPLB for redundant expert replicas, improving throughput for DeepSeek-V3/R1 and DeepSeek-V4 (blog).

KDA CuteDSL prefill kernel on Blackwell (SM100): New CuteDSL prefill kernel for Kimi-Linear (KDA), 1.08-1.52x faster than the Triton path via a reusable scratch workspace, plus a cuda-graph padding fix (#27488); see the Kimi-Linear cookbook.

Linear-attention prefix-cache memory savings: An int8 checkpoint pool stores recurrent states compactly in the Mamba radix cache, substantially increasing prefix-cache capacity for KDA / GDN models (#28185); the speculative conv-window intermediate cache is deduplicated with a sliding-window layout, halving its footprint with no numerical change (#28302).

LPLB: linear-programming load balancer for MoE expert parallelism: Balances token routing across redundant expert replicas by solving a per-layer LP; opt-in via --ep-dispatch-algorithm=lp, default behavior unchanged (#24515).

MSCCL++ integration & MNNVL allreduce fusion: MSCCL++ migrates to the upstream mscclpp Python package (Executor + DSL compiler) with auto-tuned collectives for TP=8 single-node and TP=16 two-node (#22734); FlashInfer fused allreduce + residual + RMSNorm re-enables an MNNVL backend behind --flashinfer-allreduce-fusion-backend (auto / trtllm / mnnvl), fixing the piecewise-CUDA-graph interaction (#23402).

Nemotron DP attention + MTP: Data-parallel attention for the hybrid Nemotron-H (Mamba2 + full attention + MoE), plus MTP support (#24955); see the Nemotron 3 Ultra cookbook.

AMD: breakable CUDA graph on ROCm/HIP: The breakable CUDA graph execution path now runs on AMD GPUs (#28173).

NVFP4 MoE for DeepSeek-V4: Adds an NVFP4 MoE quantization path for DeepSeek-V4 on Blackwell for higher MoE throughput; enable with --moe-runner-backend flashinfer_trtllm_routed (#25820); see the DeepSeek-V4 cookbook.

DeepSeek-V4 decode & quantization optimizations: FP8 group quantization now emits power-of-two (UE8M0) scales directly from the per-token group-quant kernel, dropping a separate rounding pass (#26766); MLA decode q-heads are padded to 64 under attention-TP so FlashMLA dispatches the ~2x cheaper head64 kernel instead of head128 (#27954); the MHC prenorm kernel is prewarmed at startup to remove the first-run JIT slowdown on a fresh server (#27986); and BF16 mixed-dtype compression states are supported on the C4 / C128 paths (#27277); see the DeepSeek-V4 cookbook.

Full release notes by category below.

New Model Support

DeepSeek V4

  • [NVIDIA] Support NVFP4 MoE for DeepSeek-V4: #25820
  • [DeepSeek-V4] Fuse UE8M0 scale rounding into FP8 group quantization: #26766
  • [NPU] Add Ascend NPU support for DeepSeek-V4: #25144
  • Deepseek v4: support mixed dtype compression states: #27277
  • [AMD] Feat: Add prefill context parallel support for deepseek v4 unified kv attention: #27928
  • DeepSeek-V4 Online Compress support MTP: #26471
  • [dsv4] Pad MLA decode q-heads to 64 (not full n_heads) for FlashMLA head64 kernel: #27954
  • [dsv4] Prewarm MHC prenorm kernel at startup: #27986
  • [LoRA] Support DSA indexer LoRA targets for GLM-5.1 / DeepSeek-V3.2-family models: #28110
  • Add DeepSeek V4 MTP acceptance length checks: #28098

Speculative Decoding

  • [Spec] Add sync-free fast_prefill_plan for EAGLE draft-extend CUDA graph: #28854

... (truncated)

Commits
  • 49e384c [Cherry-pick to release/v0.5.14] Revert Gemma4 modelopt fp4 MoE backend chang...
  • d76f027 [Cherry-pick to release/v0.5.14] Fix TRTLLM MHA FP8 KV cache scale handling (...
  • 05f1e54 [Cherry-pick to release/v0.5.14] Sync backend docs with #29063 (#29233) (#29243)
  • 0ea948d [Cherry-pick to release/v0.5.14] Sync the changes in #23402 (#29063) (#29221)
  • 683b7d1 [Cherry-pick to release/v0.5.14] Fix the CuDNN failure on bmm_fp8 when two li...
  • b364f90 [Cherry-pick to release/v0.5.14] fix(runner): autotune flashinfer MoE on a de...
  • 785f770 [Cherry-pick to release/v0.5.14] Add GB10 FP8 fused MoE Triton config (#25665...
  • 25d74ba [Cherry-pick to release/v0.5.14] Re-enable SM90 FlashInfer allreduce fusion w...
  • 7e6587c [AMD] Update v4 cookbook to clean env vars (#28981)
  • e63b57d [Fix] model init / XPU / transformers-v5 / bench-image fixes (#28292)
  • Additional commits viewable in compare view

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Bumps [sglang](https://github.com/sgl-project/sglang) from 0.4.6.post5 to 0.5.14.
- [Release notes](https://github.com/sgl-project/sglang/releases)
- [Commits](sgl-project/sglang@v0.4.6.post5...v0.5.14)

---
updated-dependencies:
- dependency-name: sglang
  dependency-version: 0.5.14
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot Bot added dependencies Pull requests that update a dependency file python Pull requests that update python code labels Jun 29, 2026
@dependabot dependabot Bot requested review from OliverLeeXZ and rababit as code owners June 29, 2026 21:35
@dependabot dependabot Bot added dependencies Pull requests that update a dependency file python Pull requests that update python code labels Jun 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file python Pull requests that update python code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants