-
Notifications
You must be signed in to change notification settings - Fork 914
Pull requests: flashinfer-ai/flashinfer
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
fix(mla): widen page index to int64_t to avoid 32-bit overflow
op: attention
#3136
opened Apr 21, 2026 by
Tracin
Loading…
1 of 5 tasks
fix(jit): decouple FLASHINFER_JIT_VERBOSE from debug compilation flags
#3135
opened Apr 21, 2026 by
leonardozcm
Loading…
[CuTe DSL] Fix FP8 MLA persistent perf regression and ProxyKind cu13 wheel breakage
#3132
opened Apr 21, 2026 by
pgera
Contributor
Loading…
6 tasks done
feat: Enable FP8 (E4M3/E5M2) in concat_mla_k for optimize long-context prefill performance and refactor type dispatch for BF16/FP16
#3129
opened Apr 21, 2026 by
qiching
Collaborator
Loading…
4 tasks done
optimize gdn decode bf16 state kernel for mtp with caching.
model: qwen3.5
#3127
opened Apr 20, 2026 by
ameynaik-hub
Contributor
Loading…
5 tasks
autotuner: check cache before synthesizing profile input tensors
run-ci
#3126
opened Apr 20, 2026 by
leejnau
Contributor
Loading…
5 tasks done
docs(pod): flesh out POD-attention wrapper docstrings
#3124
opened Apr 20, 2026 by
Zlatanwic
Loading…
5 tasks done
feat(sampling): add fused top-k/top-p sampling with filtered probabil…
op: misc
run-ci
#3121
opened Apr 20, 2026 by
haozhihuiYmh150
•
Draft
2 of 5 tasks
feat(gdn): add Triton WY-parallel MTP decode kernels
model: qwen3.5
#3120
opened Apr 20, 2026 by
ameynaik-hub
Contributor
•
Draft
5 tasks
perf(gdn): fix bf16_state T=1 per-call overhead and add pool+padding …
model: qwen3.5
#3118
opened Apr 19, 2026 by
ameynaik-hub
Contributor
Loading…
5 tasks
Port TRT-LLM fused qk norm rope
op: attention
op: comm
op: gemm
op: moe
#3117
opened Apr 18, 2026 by
murphymatt
Contributor
Loading…
5 tasks done
Reland support lse in trtllm paged attn kernels
op: attention
#3116
opened Apr 18, 2026 by
murphymatt
Contributor
Loading…
5 tasks done
perf(autotuner): replace power-of-2 token buckets with hybrid spacing & fix missing routing_replay_out arg
op: gemm
op: moe
run-ci
#3115
opened Apr 18, 2026 by
StudyingShao
Loading…
5 tasks done
Normalize modular attention to CUTLASS DSL APIs shipped on PyPI
op: attention
run-ci
#3112
opened Apr 17, 2026 by
saltyminty
Collaborator
Loading…
5 tasks
fix(fmha_v2): enable flash_attention for Q_PAGED_KV regardless of s_kv
#3106
opened Apr 17, 2026 by
blake-snc
Contributor
Loading…
Report unit test files with no result
#3105
opened Apr 17, 2026 by
dierksen
Collaborator
Loading…
2 of 5 tasks
Fix multi-instances using same random seed
op: comm
#3102
opened Apr 17, 2026 by
guyuankan
Loading…
4 of 5 tasks
Add int8 paged KV support to main paths
op: attention
#3100
opened Apr 17, 2026 by
lesj0610
Loading…
5 tasks done
Add int4 paged KV support to main paths
op: attention
#3101
opened Apr 17, 2026 by
lesj0610
Loading…
5 tasks done
Support NVFP4 KV for prefill and batch attention kernels
op: attention
run-ci
#3097
opened Apr 17, 2026 by
Tom-Zheng
Contributor
Loading…
5 tasks done
feat: implement configurable
tie_break for filtered topk
op: misc
run-ci
#3095
opened Apr 17, 2026 by
zianglih
Contributor
Loading…
5 tasks done
Previous Next
ProTip!
What’s not been updated in a month: updated:<2026-03-21.