vllm-project / vllm Public

Notifications You must be signed in to change notification settings
Fork 12.6k
Star 67.7k

Code
Issues 1.7k
Pull requests 1.4k
Discussions
Actions
Projects 23
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Pull requests: vllm-project/vllm

Labels 45 Milestones 2

New pull request New

1,377 Open 17,533 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

[Bugfix] Add missing o_data_type parameter for FlashInfer 0.6.1 compatibility bug

Something isn't working

nvidia v1

#32515 opened Jan 17, 2026 by amanwalksdownthestreet

Loading…

[Performance] Split FlashInfer attention and cache update nvidia v1

#32514 opened Jan 17, 2026 by Etelis

Loading…

[V1][Core] Rename engine_core to engine_core_client for clarity v1

#32513 opened Jan 17, 2026 by junuxyz

Loading…

Fail-fast guard: block MTP on GLM-4.7-FP8 to avoid CUDA illegal access nvidia speculative-decoding v1

#32512 opened Jan 17, 2026 by StanByriukov02

Loading…

[Model] Support Step1 Model new-model

Requests to new models

#32511 opened Jan 17, 2026 by randzero

Loading…

5 tasks

Refactor KV cache updates across attention backends nvidia rocm

Related to AMD ROCm

#32509 opened Jan 17, 2026 by VedantMadane

Loading…

Add MTP for opanpangu_pro_moe model, fix an initialization bug in StaticSinkAttention bug

Something isn't working

#32508 opened Jan 17, 2026 by yt0428

Loading…

5 tasks

[Fix] test test_function_calling_with_streaming_types about mcp

#32507 opened Jan 17, 2026 by lengrongfu

Loading…

5 tasks

[Tool Parser] Fix hermes streaming mode returning raw text instead of tool calls tool-calling

#32506 opened Jan 17, 2026 by karanb192

Loading…

4 tasks done

[Bugfix] Fix llama4_pythonic tool parser for nested list parameters bug

Something isn't working

llama

Related to Llama models

tool-calling

#32505 opened Jan 17, 2026 by karanb192

Loading…

3 of 4 tasks

[Bugfix] Fix Kimi-K2 tool parser streaming regex for multiple tool calls bug

Something isn't working

#32504 opened Jan 17, 2026 by karanb192

Loading…

[Misc] Assign worker process titles and logging prefix earlier v1

#32503 opened Jan 17, 2026 by karanb192

Loading…

1 of 2 tasks

fix: use atomic write for compile cache to prevent race condition

#32502 opened Jan 17, 2026 by T1mn

Loading…

perf(lora): use dict lookup instead of list.index() in convert_mapping

#32501 opened Jan 17, 2026 by JayZenith

Loading…

Adding LoRA support for qwen omni model qwen

Related to Qwen models

#32500 opened Jan 17, 2026 by 0xD4rky

Loading…

5 tasks

[MLA] Add nvfp4 packed KV cache decode path via dequant cache op #32220 ci/build nvidia v1

#32499 opened Jan 17, 2026 by baonudesifeizhai

Loading…

5 tasks

[Bugfix][Hardware][AMD] Fix RCCL initialization in Ray distributed executor bug

Something isn't working

rocm

Related to AMD ROCm

#32497 opened Jan 17, 2026 by c0de128

Loading…

2 tasks

[Bugfix] Fix Llama 4 FP8 failure with FlashInfer on B200 (Nullptr crash) bug

Something isn't working

llama

Related to Llama models

nvidia

#32496 opened Jan 17, 2026 by lfopensource

Loading…

5 tasks

Add embedding input functionality for disabled modalities [remake] documentation

Improvements or additions to documentation

frontend multi-modality

Related to multi-modality (#4194)

needs-rebase v1

#32493 opened Jan 16, 2026 by reaganjlee • Draft

5 tasks

[RFC][ROCM] Enable aiter attn backend for qwen3-next model qwen

Related to Qwen models

rocm

Related to AMD ROCm

#32492 opened Jan 16, 2026 by jennyyyyzhen

Loading…

5 tasks

[WIP] Update FlashMLA ci/build ready

ONLY add when PR is ready to merge/full CI is needed

ready-run-all-tests

Trigger CI with all tests for wide-ranging PRs

#32491 opened Jan 16, 2026 by LucasWilkinson

Loading…

fix(reasoning): don't check prompt_token_ids for reasoning end state frontend

#32490 opened Jan 16, 2026 by farazshaikh

Loading…

3 tasks

[CI][Attention] Add more CI dependencies for attention tests ci/build

#32487 opened Jan 16, 2026 by MatthewBonanni

Loading…

5 tasks

"refactor: refactor_repeated_interfaces" ready

ONLY add when PR is ready to merge/full CI is needed

#32486 opened Jan 16, 2026 by tom-zju

Loading…

5 tasks

[not ready for review] extend fp8 online quant with blockwise scaling

#32485 opened Jan 16, 2026 by vkuzo

Loading…

5 tasks

Previous 1 2 3 4 5 … 55 56 Next

Previous Next

ProTip! Type g p on any issue or pull request to go back to the pull request listing page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!