Skip to content

Port CUDA SSD streaming support#349

Open
linuxbest wants to merge 1 commit into
antirez:mainfrom
linuxbest:cuda-ssd-streaming
Open

Port CUDA SSD streaming support#349
linuxbest wants to merge 1 commit into
antirez:mainfrom
linuxbest:cuda-ssd-streaming

Conversation

@linuxbest
Copy link
Copy Markdown

Add CUDA support for SSD streaming async selected loads and shared overlap decode paths, while preserving the existing Metal streaming path and ds4_lx session cancellation API.

Also update SSD streaming help text so it is described as a GPU graph backend feature instead of Metal-only.

Verification: make ds4; make ds4-server ds4-bench ds4-eval ds4-agent; make cuda-regression; make ds4 ds4-bench; git diff --check.

Quick cold-cache benchmark on NVIDIA GB10 with 8GB expert cache: prefill 2048 tokens at ~30-32 tok/s, decode 32 tokens at ~2.1 tok/s.

Add CUDA support for SSD streaming async selected loads and shared overlap decode paths, while preserving the existing Metal streaming path and ds4_lx session cancellation API.

Also update SSD streaming help text so it is described as a GPU graph backend feature instead of Metal-only.

Verification: make ds4; make ds4-server ds4-bench ds4-eval ds4-agent; make cuda-regression; make ds4 ds4-bench; git diff --check.

Quick cold-cache benchmark on NVIDIA GB10 with 8GB expert cache: prefill 2048 tokens at ~30-32 tok/s, decode 32 tokens at ~2.1 tok/s.
@linuxbest
Copy link
Copy Markdown
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant