-
-
Notifications
You must be signed in to change notification settings - Fork 12.6k
Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[Bugfix] Add missing o_data_type parameter for FlashInfer 0.6.1 compatibility
bug
Something isn't working
nvidia
v1
#32515
opened Jan 17, 2026 by
amanwalksdownthestreet
Loading…
[Performance] Split FlashInfer attention and cache update
nvidia
v1
#32514
opened Jan 17, 2026 by
Etelis
Loading…
[V1][Core] Rename engine_core to engine_core_client for clarity
v1
#32513
opened Jan 17, 2026 by
junuxyz
Loading…
Fail-fast guard: block MTP on GLM-4.7-FP8 to avoid CUDA illegal access
nvidia
speculative-decoding
v1
#32512
opened Jan 17, 2026 by
StanByriukov02
Loading…
[Model] Support Step1 Model
new-model
Requests to new models
v1
#32511
opened Jan 17, 2026 by
randzero
Loading…
5 tasks
Refactor KV cache updates across attention backends
nvidia
rocm
Related to AMD ROCm
v1
#32509
opened Jan 17, 2026 by
VedantMadane
Loading…
Add MTP for opanpangu_pro_moe model, fix an initialization bug in StaticSinkAttention
bug
Something isn't working
v1
#32508
opened Jan 17, 2026 by
yt0428
Loading…
5 tasks
[Fix] test test_function_calling_with_streaming_types about mcp
#32507
opened Jan 17, 2026 by
lengrongfu
Loading…
5 tasks
[Tool Parser] Fix hermes streaming mode returning raw text instead of tool calls
tool-calling
#32506
opened Jan 17, 2026 by
karanb192
Loading…
4 tasks done
[Bugfix] Fix llama4_pythonic tool parser for nested list parameters
bug
Something isn't working
llama
Related to Llama models
tool-calling
#32505
opened Jan 17, 2026 by
karanb192
Loading…
3 of 4 tasks
[Bugfix] Fix Kimi-K2 tool parser streaming regex for multiple tool calls
bug
Something isn't working
#32504
opened Jan 17, 2026 by
karanb192
Loading…
[Misc] Assign worker process titles and logging prefix earlier
v1
#32503
opened Jan 17, 2026 by
karanb192
Loading…
1 of 2 tasks
fix: use atomic write for compile cache to prevent race condition
#32502
opened Jan 17, 2026 by
T1mn
Loading…
perf(lora): use dict lookup instead of list.index() in convert_mapping
#32501
opened Jan 17, 2026 by
JayZenith
Loading…
Adding LoRA support for qwen omni model
qwen
Related to Qwen models
v1
#32500
opened Jan 17, 2026 by
0xD4rky
Loading…
5 tasks
[MLA] Add nvfp4 packed KV cache decode path via dequant cache op #32220
ci/build
nvidia
v1
#32499
opened Jan 17, 2026 by
baonudesifeizhai
Loading…
5 tasks
[Bugfix] Fix Llama 4 FP8 failure with FlashInfer on B200 (Nullptr crash)
bug
Something isn't working
llama
Related to Llama models
nvidia
#32496
opened Jan 17, 2026 by
lfopensource
Loading…
5 tasks
Add embedding input functionality for disabled modalities [remake]
documentation
Improvements or additions to documentation
frontend
multi-modality
Related to multi-modality (#4194)
needs-rebase
v1
#32493
opened Jan 16, 2026 by
reaganjlee
•
Draft
5 tasks
[RFC][ROCM] Enable aiter attn backend for qwen3-next model
qwen
Related to Qwen models
rocm
Related to AMD ROCm
v1
#32492
opened Jan 16, 2026 by
jennyyyyzhen
Loading…
5 tasks
[WIP] Update FlashMLA
ci/build
ready
ONLY add when PR is ready to merge/full CI is needed
ready-run-all-tests
Trigger CI with all tests for wide-ranging PRs
#32491
opened Jan 16, 2026 by
LucasWilkinson
Loading…
fix(reasoning): don't check prompt_token_ids for reasoning end state
frontend
#32490
opened Jan 16, 2026 by
farazshaikh
Loading…
3 tasks
[CI][Attention] Add more CI dependencies for attention tests
ci/build
#32487
opened Jan 16, 2026 by
MatthewBonanni
Loading…
5 tasks
"refactor: refactor_repeated_interfaces"
ready
ONLY add when PR is ready to merge/full CI is needed
#32486
opened Jan 16, 2026 by
tom-zju
Loading…
5 tasks
[not ready for review] extend fp8 online quant with blockwise scaling
#32485
opened Jan 16, 2026 by
vkuzo
Loading…
5 tasks
Previous Next
ProTip!
Type g p on any issue or pull request to go back to the pull request listing page.