Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
2181 commits
Select commit Hold shift + click to select a range
ce8123c
[Benchmark] Update backend_request_func.py (#6566)
ZhangYulongg Feb 28, 2026
54f7d9f
[CI] Sync mm_batch_invariant with paddle.mm update (#6557)
EmmonsCurse Feb 28, 2026
97eee75
[Feature] GPU Memory Optimization and Retirement of V0 Scheduler (#6407)
ming1753 Feb 28, 2026
a2072fe
[XPU] support warmup with ep & remove apply_tp_fused_op (#6289)
zccjjj Feb 28, 2026
5d42f19
[BugFix][Scheduler] Fix can_schedule_block_num_threshold calculation …
kevincheng2 Feb 28, 2026
fa21fd9
[Docs] Update code overview documentation (#6568)
kevincheng2 Feb 28, 2026
977e2cc
[CI] 【Hackathon 10th Spring No.23】fastdeploy/model_executor/layers/mo…
0Ayachi0 Feb 28, 2026
051bbbe
[Benchmark] Update backend_request_func.py (#6575)
ZhangYulongg Feb 28, 2026
bb51829
[CI] Fix tests and docs to resolve failure (#6572)
EmmonsCurse Mar 1, 2026
7cf5e64
[BugFix] fix cache transfer manager init failed when using block_wise…
liyonghua0910 Mar 1, 2026
ea4d10d
[BugFix] fix cache int8 for pd disaggregated deployment (#6563)
liyonghua0910 Mar 1, 2026
59b578c
[Feature]Supports SWA based on appendattn (#6547)
chang-wenbin Mar 1, 2026
7cfb0ff
fix pfcc deep ep in low latency mode (#6440)
RichardWooSJTU Mar 2, 2026
5382fb2
[BugFix] lazy enable_torch_proxy for cutlass (#6523)
ckl117 Mar 2, 2026
16a2a32
[Metax][Fix] fix error based pr#6407 (#6584)
StareAtYou Mar 2, 2026
d957ccd
seq_lens related tensor shape -> [max_num_seqs] (#6535)
zhoutianzi666 Mar 2, 2026
481d0e3
[CI] Skip long-sequence case due to potential non-determinism (#6587)
EmmonsCurse Mar 2, 2026
ecfd088
[BugFix] Add safety checks in recycle_gpu_blocks to prevent block all…
kevincheng2 Mar 2, 2026
6674131
[Iluvatar] Support CudaGraph and optimize flash_attn_unpadded and fus…
wuyujiji Mar 2, 2026
fe0b3a9
[PD Disaggregation] Fix cache messager performance problem & add kv t…
RichardWooSJTU Mar 2, 2026
7bd86f9
[BugFix] Fix tbo nan (#6439)
RichardWooSJTU Mar 2, 2026
758770b
[CI] 【Hackathon 10th Spring No.28】功能模块 fastdeploy/entrypoints/engine_…
kesmeey Mar 2, 2026
6d83dcc
more eplb offline load dtypes (#6435)
RichardWooSJTU Mar 2, 2026
344db8c
[BugFix] Fix mtp when token_ids_all is None (#6591)
ming1753 Mar 2, 2026
3cf7c6c
[Metax][Fix] fix ci error based pr#6535 (#6600)
StareAtYou Mar 2, 2026
aae87e6
[CI] 【Hackathon 10th Spring No.27】功能模块 fastdeploy/cache_manager/prefi…
kesmeey Mar 2, 2026
33d6d24
[BugFix] fix bug when seq_lens_this_time is 2D (#6613)
ming1753 Mar 2, 2026
0f718ba
[Speculative Decoding]Reformat input preprocess for spec decode (#6501)
huicongyao Mar 3, 2026
3cc0941
support dsv3 use flashmla (#6593)
zhoutianzi666 Mar 3, 2026
1cae7a0
weight only quant method support QKVGate_proj (#6612)
ckl117 Mar 3, 2026
375b5b7
[Feature]Log Format Normalization and Trace Log Optimization (#6370)
qwes5s5 Mar 3, 2026
61789fe
[Quantization] Support to load static quant ue8m0 scale of DeepGEMM v…
RichardWooSJTU Mar 3, 2026
0256975
[BugFix] fix mtp_config in rl (#6595)
Deleter-D Mar 3, 2026
27ae02f
[BugFix] fix prefix tree updating timeout (#6615)
liyonghua0910 Mar 3, 2026
4ff3f42
[XPU] Add update_attn_mask_offsets op for xpu. (#6556)
Jiajun-Ji Mar 3, 2026
c5eb6b6
[Bug Fix] Fix MM mtp incorrect rope emb (#6581)
ming1753 Mar 3, 2026
c3d6d70
[CI] Add nightly workflow for golang_router tests and improve log han…
EmmonsCurse Mar 3, 2026
9a48a41
[CI] Fix accidental deletion of failed_tests.log during log cleanup (…
EmmonsCurse Mar 3, 2026
c637692
[XPU] support MTP Step > 1 (#6609)
lizan1999 Mar 4, 2026
29d9cb1
fix tp4 dp1 (#6624)
cmcamdy Mar 4, 2026
3d3221e
[CI] 【Hackathon 10th Spring No.31】功能模块 fastdeploy/model_executor/laye…
kesmeey Mar 4, 2026
e8e18ce
[Metax][Fix] fix ci error based pr#6501 (#6636)
StareAtYou Mar 4, 2026
aee97e3
fix exist_prefill_flag when preempted task (#6629)
Sunny-bot1 Mar 4, 2026
02d32ee
Revert "[Bug Fix] Fix MM mtp incorrect rope emb (#6581)" (#6631)
ming1753 Mar 4, 2026
3345641
[Iluvatar][CI] fix the dim error of seq_lens_encoder and seq_lens_dec…
wuyujiji Mar 4, 2026
1256fd3
[XPU] weight only quant method support QKVGate_proj (#6641)
zhupengyang Mar 4, 2026
598cce8
[RL] Support SM100 FP8 quantization in RL (#6601)
bukejiyu Mar 4, 2026
81e04bf
[BugFix] fix flash attn mtp rope emb bug (#6649)
ming1753 Mar 4, 2026
5c8f518
[CI] Add pytest timeout and enable workflow rerun (#6645)
EmmonsCurse Mar 4, 2026
ddb06ff
init (#6642)
gongweibao Mar 4, 2026
56ceeda
[CI] Adjust model-specific diff threshold and include iluvatar XPU pa…
EmmonsCurse Mar 5, 2026
fa4815b
[BugFix] fix dp sheduler bug in ep4tp1 when start by using multi_api_…
ddchenhao66 Mar 5, 2026
0dc7034
[Model Runner] Deprecate not_need_stop (#6356)
Sunny-bot1 Mar 5, 2026
63414cc
[XPU][CI] Fix XPU CI Bug (#6658)
plusNew001 Mar 5, 2026
cebe6f7
clean nvfp4 related code (#6644)
zhoutianzi666 Mar 5, 2026
326b975
[BugFix][MTP] Skip empty_input_forward during dummy run (#6653)
yuanlehome Mar 5, 2026
fa1906b
[BugFix] Fix inaccurate cache hit rate and TTFT after request preempt…
liyonghua0910 Mar 5, 2026
a79b82c
[BugFix] fix seq_lens_this_time init (#6670)
Sunny-bot1 Mar 5, 2026
16a393e
[CI] Fix non-deterministic test and skip failed_tests.log in log prin…
EmmonsCurse Mar 5, 2026
839bc83
[BugFix] Fix EB5 model runner compatibility check in worker process (…
Sunny-bot1 Mar 5, 2026
b0fd242
[BugFix] Fix error in dynamic c8 cache (#6544)
juncaipeng Mar 6, 2026
81acdb6
[Iluvatar][CI] Do not specify FD_LOG_DIR (#6665)
wuyujiji Mar 6, 2026
caf73e8
[Feature]add reasoning effort (#6656)
luukunn Mar 6, 2026
5d9524f
[Models][Feature] Support new ERNIE reward model and add return_token…
sunlei1024 Mar 6, 2026
aac1484
[Feature]add arguments string in tool (#6704)
luukunn Mar 6, 2026
1e49855
[BugFix][DataProcessor] Add validate_model_path to fail fast on bad m…
gongweibao Mar 8, 2026
cbfdf42
[CI] Add test_dynamic_c8_cache.py and latest FastDeploy.tar.gz upload…
EmmonsCurse Mar 8, 2026
3c0ff20
[BugFix] fix incorrect function parameters of start_data_parallel_ser…
ddchenhao66 Mar 9, 2026
3a85ecf
[Others] Fix typos in log messages and comments (#6707)
cloudforge1 Mar 9, 2026
30f9f33
[Feature][BugFix][OP] Enhance Deterministic Inference Mode with Kern…
gongweibao Mar 9, 2026
ae71ada
reduce warmup input_length for cudagragh (#6701)
zccjjj Mar 9, 2026
0c69cdf
[CI] 【Hackathon 10th Spring No.24】功能模块 fastdeploy/model_executor/laye…
0Ayachi0 Mar 9, 2026
3897a0b
nvfp4 clean code (#6671)
zhoutianzi666 Mar 9, 2026
28f7727
[Feature] Set overlap schedule as default (#6668)
Sunny-bot1 Mar 9, 2026
ecc5032
[XPU] Add return value checks for all XPU kernel launches (#6666)
mayang002 Mar 10, 2026
8e322f9
add reconstruct (#6675)
bukejiyu Mar 10, 2026
73de8b9
[CI] Update test_determinism_long.py to reduce execution time
EmmonsCurse Mar 10, 2026
8b8f0c5
fix update param (#6723)
bukejiyu Mar 10, 2026
22d308a
[Docs] Specify the default strategy (#6728)
mouxinqq Mar 10, 2026
25c4793
[CI][MetaX]Add timeout to Jenkins job trigger step (#6755)
plusNew001 Mar 10, 2026
c3aceb6
[Models][OP][Optimization] Support DeepSeek-v3.2 model, integrate DSA…
chang-wenbin Mar 10, 2026
54581b8
[BugFix]fix iluvatar_model_runner about dsa_cache (#6733)
chang-wenbin Mar 10, 2026
b57c960
cuda13.0, implement changes to CCCL (#6751)
mitu626 Mar 10, 2026
79ad949
[BugFix] Fix updating weight when enable cache storage (#6719)
juncaipeng Mar 10, 2026
5965198
[CI] Temporarily disable test_determinism_offline.py
EmmonsCurse Mar 10, 2026
67388ce
[Iluvatar][CI] Replace ci in ernie-300B-4layer with ernie-21b. (#6747)
wuyujiji Mar 10, 2026
18b0716
[XPU] fix wint4 (#6757)
zhupengyang Mar 10, 2026
812657b
fix pd overlap (#6753)
Sunny-bot1 Mar 10, 2026
6520ae8
[BugFix] fix grpc failure when tracing init before workers forked (#6…
liyonghua0910 Mar 10, 2026
b05a6c4
[BugFix][KVCache] Add inter-process lock to fix NaN error under DP+EP…
Jiang-Jia-Jun Mar 10, 2026
a502dda
[BugFix] fix multi-step mtp bug (#6754)
ddchenhao66 Mar 11, 2026
f6adcc0
Remove BUILD_WHEEL=2 (Python-only quick install) mode from build.sh (…
gongweibao Mar 11, 2026
be36133
Remove Python-only mode documentation from installation guides (#6784)
gongweibao Mar 11, 2026
b6190de
[Feature] Add concurrency protection to selectworker (#6775)
mouxinqq Mar 11, 2026
cf7934a
[Speculative Decoding] Unify Spec and non-spec branch (#6685)
freeliuzc Mar 11, 2026
7811eec
[fix] resolve get_save_output_v1 socket name conflicts between multip…
liyonghua0910 Mar 11, 2026
cffa8c2
[Others]update paddleformer 1.0.0 (#6496)
bukejiyu Mar 11, 2026
97a4b36
[Processor]add qwen3vl prompt_token_ids support (#6764)
CSWYF3634076 Mar 11, 2026
1118351
[Optimization] Update Deepseekv3.2 model and dsa-indexer networking a…
chang-wenbin Mar 11, 2026
0466c7e
Set MC_TCP_BIND_ADDRESS for mooncake store (#6782)
juncaipeng Mar 11, 2026
9f0778f
[Feature] Support EP prefill with num_worst_tokens (#6574)
RichardWooSJTU Mar 11, 2026
88c4fbf
[XPU] Add speculate_limit_thinking_content_length Op. (#6627)
Jiajun-Ji Mar 11, 2026
f0ab8ee
[Iluvatar][CI] add triton in requirements_iluvatar.txt (#6788)
wuyujiji Mar 11, 2026
deff121
[CI] Update _build_linux_rl.yml to use cu129 nighlty
EmmonsCurse Mar 11, 2026
1fef825
Fix environment variable name for KV cache lock
Jiang-Jia-Jun Mar 12, 2026
7d31a72
Add PD+EP cudagraph Support
iosmers Mar 12, 2026
3543088
[XPU] rm stop nums (#6651)
cmcamdy Mar 12, 2026
e0febf3
fix debug log (#6766)
qwes5s5 Mar 12, 2026
1ed6073
[Feature] Update logging for Golang Router (#6801)
mouxinqq Mar 12, 2026
cdaf6dd
[RL][Cherry-Pick] Support Fully Async and PrefixCache (#6599)
gongshaotian Mar 12, 2026
a3d7979
[XPU][CI]Rename test_ep4tp1_online.py to run_ep4tp1_online.py (#6805)
plusNew001 Mar 12, 2026
250ce40
[Feature] use phi permute/unpermute & rm swiglu (#6361)
fxyfxy777 Mar 12, 2026
901b38c
[Iluvatar] Optimize decode group_gemm and Support cuda graph for erni…
wuyujiji Mar 12, 2026
a9ace99
[Metax][Fix] fix ci error based pr#6805 caused by pr#6685 (#6807)
StareAtYou Mar 12, 2026
2e63d88
[Optimization][Speculative Decoding]Fuse padding sampling params (#6765)
huicongyao Mar 12, 2026
ab0eacb
[CI] Update _build_linux_rl.yml to use Paddle installation method wit…
EmmonsCurse Mar 12, 2026
d73fd87
[CI] Add daily build_linux jobs for CUDA 13.0 (#6809)
EmmonsCurse Mar 12, 2026
1f9f889
[XPU] refactor: XPU plugin namespace migration (#6799)
mayang002 Mar 13, 2026
cb5a742
[Metax][Test] enable paddleocr using cudagraph (#6820)
StareAtYou Mar 13, 2026
586e6f3
[Others]Limit transformers version (#6806)
bukejiyu Mar 13, 2026
6211004
[RL] add stream guard (#6814)
liufengwei0103 Mar 13, 2026
d935752
[CI] 【Hackathon 10th Spring No.20】功能模块 fastdeploy/engine/common_engin…
kesmeey Mar 13, 2026
2b8a5b0
update indexer model (#6791)
chang-wenbin Mar 13, 2026
8eb1771
[BugFix]rm draft code for glm (#6810)
fxyfxy777 Mar 13, 2026
8906e09
[Feature][OP] Add batch-invariant RMSNorm kernel and TP embedding Cus…
gongweibao Mar 13, 2026
12f4124
[Speculative Decoding] Fix speculate stop_seqs and fix accept_num in …
freeliuzc Mar 13, 2026
49fe68a
[Docs] Update Golang Router FAQ (#6829)
mouxinqq Mar 13, 2026
8c1a282
DSA clean code (#6827)
zhoutianzi666 Mar 13, 2026
7591e0d
fix eb5 mtp(mix) (#6800)
cmcamdy Mar 13, 2026
3f4441b
[XPU]add mtp cudagraph support (#6831)
iosmers Mar 13, 2026
820eb60
[Others] clean code (#6839)
zhoutianzi666 Mar 14, 2026
091e3c8
Dsa clean code,add dsk_attn_write_cache baseline (#6855)
zhoutianzi666 Mar 16, 2026
4d39232
[BugFix] add ut for fused_moe_degemm (#6840)
fxyfxy777 Mar 16, 2026
7c8c0a3
[BugFix] replace ftok with custom_ftok in get_output/save_output ops …
liyonghua0910 Mar 16, 2026
3fabba0
[Feature] Add Triton unified attention kernel for deterministic infer…
gongweibao Mar 16, 2026
72ff7bf
[XPU] Fix wrapper files (#6830)
mayang002 Mar 16, 2026
04fde3b
[PD Disaggregation] Prefill and decode support cache storage (#6768)
juncaipeng Mar 16, 2026
bb925c6
[Other] Adjust GPUModelRunner to enhance compatibility (#6851)
ming1753 Mar 16, 2026
c9f7f52
[Optimization][BugFix]Optimize Deepseek networking code (#6861)
chang-wenbin Mar 16, 2026
c5f402e
Update title and release note in README_CN.md
Jiang-Jia-Jun Mar 16, 2026
bd4b609
Update title and activity section in README_CN.md
Jiang-Jia-Jun Mar 16, 2026
5c92f4d
[Feature] Add deepgemm bias epilogue for SM100 (#6857)
Wanglongzhi2001 Mar 16, 2026
d113397
Simplify available_blocks assignment logic (#6819)
Jiang-Jia-Jun Mar 16, 2026
a6351de
[BugFix][Optimization] Replace silent failures with catchable excepti…
gongweibao Mar 16, 2026
4ed483d
[BugFix] Fix ep compatibility issues & Optimize permute operator (#6821)
RichardWooSJTU Mar 17, 2026
fe8d58a
[Optimization]update request in tool parser&reasoning parser (#6858)
luukunn Mar 17, 2026
eab429d
fix performance drop while no spec (#6866)
huicongyao Mar 17, 2026
3b7507a
test_abort (#6743)
qwes5s5 Mar 17, 2026
ea998dd
clean clean code in _load_per_tensor_weight_scale (#6868)
zhoutianzi666 Mar 17, 2026
b152bae
[CI] disable test_batch_invariance_op_logsoftmax.py in unit_test
EmmonsCurse Mar 17, 2026
950366e
[PD Disaggregation][RL] Register to router with version and support r…
juncaipeng Mar 17, 2026
12eb001
Remove comments on multi-mode request handling
Jiang-Jia-Jun Mar 17, 2026
daaf498
[Feature] support compute shared experts before combine for better ov…
Wanglongzhi2001 Mar 17, 2026
b61731b
[Feature][Docs] Adjust prefill release & expose load metrics (#6884)
mouxinqq Mar 17, 2026
cb6819d
[Optimization][OP]support per_token_group_fp8_quant cuda kernel (#6865)
chang-wenbin Mar 17, 2026
e4c9cac
[BugFix] Cap nvcc -t threads to avoid compilation failures on high-co…
gongweibao Mar 17, 2026
aa9deb6
[XPU] Dockerfiles update (#6898)
plusNew001 Mar 17, 2026
148eee8
[XPU] use quant2d_per_token for weight quant int8 && fix some XPU Ker…
lizan1999 Mar 17, 2026
2a371a3
[Feature] Update tpSize (#6896)
mouxinqq Mar 17, 2026
0359794
[CI] Sync _log_softmax_batch_invariant with paddle update (#6893)
EmmonsCurse Mar 17, 2026
8b890c0
[Iluvatar] refactor attn and moe code (#6887)
wuyujiji Mar 18, 2026
9660f98
[BugFix] Set FD_USE_PHI_MOE_PERMUTE = 0 Default (#6886)
fxyfxy777 Mar 18, 2026
0754368
[CI] Isolate cache and ccache for CUDA 13.0 build
EmmonsCurse Mar 18, 2026
9b117aa
support glm-moe-dsa model (#6863)
chang-wenbin Mar 18, 2026
fb6c56d
[BugFix][DataProcessor] Force top_k=1 for greedy decoding when temper…
gongweibao Mar 18, 2026
dd55cda
[CI] Add test for pd and cache storage (#6876)
juncaipeng Mar 19, 2026
4794a28
opt glm5 model (#6916)
chang-wenbin Mar 19, 2026
dd93f8f
[Optimization] Skip compat guard when torch is not installed (#6913)
SigureMo Mar 19, 2026
c184a7c
remove source in weight_loader in moe.py (#6892)
zhoutianzi666 Mar 19, 2026
1a05744
nvfp4.py support ep (#6920)
zhoutianzi666 Mar 19, 2026
f95d8ca
[RL] support qkrmsnorm use proxy-norm (#6862)
zoooo0820 Mar 19, 2026
2b84a42
[CI] Optimize CI: add timeout and cancel on PR close (#6933)
EmmonsCurse Mar 19, 2026
33e01f2
[Feature][Sampling] Extend top-k_top-p sampling to all backends and u…
Sunny-bot1 Mar 19, 2026
b1c800b
remove load_up_proj_weight_first (#6932)
zhoutianzi666 Mar 19, 2026
7141db0
[CI] Optimize CI: update nightly test_image build workflow (#6937)
EmmonsCurse Mar 19, 2026
9148562
[CI]【Hackathon 10th Spring No.35】resource_manager 单测补充 (#6734)
cloudforge1 Mar 19, 2026
c3d8db8
[Optimization] Update ZMQ server (#6735)
luukunn Mar 19, 2026
f4a79d4
[Optimization]Unified data processing for online and offline (#6891)
luukunn Mar 19, 2026
96b0ece
[Feature] Update Counter Release (#6943)
mouxinqq Mar 20, 2026
d77edf8
opt wfp8afp8 triton moe (#6938)
Sunny-bot1 Mar 20, 2026
a81116a
[Benchmark] Update Qwen3 vl dense yaml (#6945)
xjkmfa Mar 20, 2026
3b20399
[Benchmark] Update Qwen3 vl 32k yaml (#6946)
xjkmfa Mar 20, 2026
aca733b
[CI]【Hackathon 10th Spring No.32】load_weight_utils unit test (#6740)
cloudforge1 Mar 20, 2026
3a4e139
[Benchmark] fix multi turn (#6948)
ZhangYulongg Mar 20, 2026
2b10ebc
[benchmark] Refactor debug logging and payload handling (#6949)
ZhangYulongg Mar 20, 2026
1c38da2
Make seq_lens_this_time/decoder/encoder equal shape (#6942)
zhoutianzi666 Mar 20, 2026
bf7e242
[Optimization][Feature]Supports multiple batches of DSK-DSA. (#6930)
chang-wenbin Mar 20, 2026
32b6900
fix code type (#6951)
sunlei1024 Mar 20, 2026
00eb12f
[BugFix][Models] Unify PaddleFormers fused QKV TP loading and stabili…
jackyYang6 Mar 20, 2026
030820d
[CI] Optimize CI: refine check-bypass/cancel logic and fix nightly ta…
EmmonsCurse Mar 20, 2026
0b4c1cb
[CI] Change 21b ep4 to tp1_dp4 in 4_cards_tests (#6745)
EmmonsCurse Mar 20, 2026
fdd12ff
[CI] Fix: incorrect downstream job execution when only build_gpu/xpu …
EmmonsCurse Mar 22, 2026
33e79f9
[Optimization]Optimize CPU utilization (#6950)
luukunn Mar 22, 2026
7a78001
fix execute_model_normal in empty run (#6968)
Sunny-bot1 Mar 23, 2026
634d23a
[Bugfix] Align thinking_budget behavior with ERNIE reasoning flow (#6…
jackyYang6 Mar 23, 2026
5416da8
remove assert (#6970)
zhoutianzi666 Mar 23, 2026
bb881c2
[PD Disaggregation] pd + cache_storage support vl model (#6906)
juncaipeng Mar 23, 2026
5e469fc
[RL][BugFix][Optimization] Support chunked part files loading and fix…
wikilsh Mar 23, 2026
c1f7991
[BugFix] add worker_process no grad (#6971)
xiaoxiaohehe001 Mar 23, 2026
defaffd
【Hackathon 10th Spring No.45】FastDeploy 支持在 T4/V100 硬件的编译 -part (#6488)
playaswd Mar 23, 2026
1b276e6
[CI] Upgrade GitHub Actions for Node 24 compatibility (#6975)
EmmonsCurse Mar 23, 2026
c62f6b4
[Others] Fix PD reorder for MTP (#6792)
bukejiyu Mar 23, 2026
e87ce4b
[Speculative Decoding] refactor MTP and optimize spec-decoding postpr…
freeliuzc Mar 24, 2026
8b6bbb3
[Optimization] Use a separate driver when using Triton with Paddle (#…
SigureMo Mar 24, 2026
6cff780
[RL] Support moe_topk_select using Paddle native operators and Add fu…
DanielSun11 Mar 24, 2026
5780345
[XPU] fix speculate_verify (#6985)
zhupengyang Mar 24, 2026
522d12c
add deepep precision test (#6984)
zhoutianzi666 Mar 24, 2026
6f5aa88
[benchmark] update benchmark tools (#6991)
ZhangYulongg Mar 24, 2026
c92e277
[RL] RoPE without fmad opt (#6901)
ckl117 Mar 24, 2026
4e8d503
Revert "add deepep precision test (#6984)" (#7004)
EmmonsCurse Mar 25, 2026
aee293b
[CI] Optimize: add vl swap_test and remove useless code (#7000)
EmmonsCurse Mar 25, 2026
7a6c287
[Speculative Decoding] Optimize attn_mask_offset and fix mtp bug (#7005)
freeliuzc Mar 25, 2026
48cfb60
[FDConfig] Reduce FD_CUSTOM_AR_MAX_SIZE_MB default from 64 to 8 (#6997)
gongweibao Mar 25, 2026
a7f52c3
[Feature] support v1 update/clear api for RL (#6761)
liyonghua0910 Mar 25, 2026
b8bb34c
[CI] disable tests/distributed/test_communication.py in unit_test (#7…
EmmonsCurse Mar 25, 2026
482f951
Update copilot-instructions.md
Jiang-Jia-Jun Mar 25, 2026
1502b6f
add instantiations for decoder rope enfore_fmul_rn=true (#7009)
ckl117 Mar 25, 2026
d5cb276
[Optimization] Deduplicate shared image/video utilities across VL pro…
luukunn Mar 26, 2026
e6804ba
[Optimization]Streaming requests return complete special tokens. (#6998)
luukunn Mar 26, 2026
61ebac4
[CI] Fix test_communication.py and add port cleanup (#7021)
EmmonsCurse Mar 26, 2026
4fd877e
[Speculative Decoding] Support mtp expert-parallel and support differ…
freeliuzc Mar 26, 2026
25d64ef
[Speculative Decoding] Refactor Eagle MTP hidden states copy (#6812)
huicongyao Mar 26, 2026
3c9fd81
[BugFix] Fix RDMA initializes failed (#7025)
TBD1 Mar 26, 2026
4425142
[fix] remove all gather ep group control requests in normal cases (#7…
liyonghua0910 Mar 26, 2026
209e5cf
[CE]add 21b mooncake yaml (#7033)
xiegegege Mar 26, 2026
14b17c0
add completion_tokens default (#7032)
luukunn Mar 26, 2026
a31d4bf
[CI] update mtp case (#7031)
ZhangYulongg Mar 27, 2026
c3ed7db
[XPU] [CI] Fix xpu ci bug (#7014)
plusNew001 Mar 27, 2026
10c59f7
[CI] disable tests/e2e/test_Qwen3VLMoe_serving.py in unit_test (#7044)
EmmonsCurse Mar 27, 2026
6c24f19
[Feature] Update error logging (#7045)
mouxinqq Mar 27, 2026
6693bcd
[BugFix] fix clear_parameters in draft cudagraph (#7035)
Deleter-D Mar 27, 2026
8ff8236
[Optimization] optimize fused_swiglu_fp8_quant_kernel (#7007)
fxyfxy777 Mar 27, 2026
11ad95b
[CI]【Hackathon 10th Spring No.43】ernie4_5_mtp 单测补充 (#6738)
cloudforge1 Mar 27, 2026
bf8e9bf
[XPU] Fix speculate schedule (#7049)
cmcamdy Mar 27, 2026
f25760f
[CI] Update docker run command in unit test coverage workflow (#7050)
ZhangYulongg Mar 27, 2026
842c608
[CI] Align with Paddle layer_norm kernel update (#7056)
EmmonsCurse Mar 27, 2026
a7cbe3f
[CI] Adapt to codecov action changes for Node.js 24 (#7064)
EmmonsCurse Mar 29, 2026
9765fa7
[Refactor] Replace --skip-mm-profiling with --deploy-modality text (#…
kevincheng2 Mar 30, 2026
7a20eae
[Feature] Support cute cpp Encoder FA4 (#7016)
mpgemm Mar 30, 2026
2eea6fa
[BugFix] Fix kv cache int8 dynamic quant on flash and flash_mask back…
Wanglongzhi2001 Mar 30, 2026
61a9079
[Feature] Update logging (#7072)
mouxinqq Mar 30, 2026
1a1d048
[Feature] Support NVFP4 Flashinfer-cutedsl MoE on SM100 (#6963)
mpgemm Mar 30, 2026
5c60e2f
fix bug in cudagraph (#7069)
zhangbo9674 Mar 30, 2026
6d2ab8f
[BugFix] Add lock to avoid generating nan when using storage cache (#…
juncaipeng Mar 30, 2026
1670b01
Revert "[BugFix] Add lock to avoid generating nan when using storage …
Jiang-Jia-Jun Mar 30, 2026
b9f8873
[Optimization]Merge Text processor (#7030)
luukunn Mar 30, 2026
76cf5e9
[append attention] clean code (#7062)
zhoutianzi666 Mar 30, 2026
18062c5
[BugFix][KVCache] Fix mm hash boundary comparison in get_block_hash_e…
kevincheng2 Mar 30, 2026
8789329
[Iluvatar] Support wi4a16 group_gemm (#7078)
wuyujiji Mar 30, 2026
e33bacd
Add debug log for troubleshooting
rainyfly Mar 30, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
3 changes: 2 additions & 1 deletion .clang-format
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
---
Language: Cpp
BasedOnStyle: Google
IndentWidth: 4
IndentWidth: 2
TabWidth: 2
ContinuationIndentWidth: 4
AccessModifierOffset: -1 # The private/protected/public has no indent in class
Expand All @@ -26,4 +26,5 @@ BinPackParameters: false
BinPackArguments: false
IncludeBlocks: Preserve
IncludeIsMainSourceRegex: (\.cu)$
SortIncludes: false
...
174 changes: 174 additions & 0 deletions .claude/skills/cuda-kernel-unittest.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,174 @@
# Skill: CUDA Kernel Unit Test

Write unit tests for PaddlePaddle CUDA custom ops following a modular 4-layer architecture.

## Trigger

When the user asks to write/create/add unit tests for a CUDA kernel (`.cu` file in `custom_ops/`).

## Steps

1. **Read the CUDA kernel source** to understand: input/output tensors, dtypes, shapes, which tensors are CPU vs GPU, scalar attrs, in-place semantics.
2. **Write the test file** in `tests/operators/test_<kernel_name>.py` following the structure below.

## Test File Structure

```python
import unittest
from typing import Any, Dict
import numpy as np
import paddle

# --- Import ops (bypass fastdeploy.__init__) ---
try:
import sys, os
_fd_root = os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
if _fd_root not in sys.path:
sys.path.insert(0, _fd_root)
from fastdeploy.import_ops import import_custom_ops
_package = "fastdeploy.model_executor.ops.gpu"
import_custom_ops(_package, ".fastdeploy_ops", globals())
except ImportError as e:
print(f"Import error: {e}")
raise

CUDA_PLACE = paddle.CUDAPlace(0) if paddle.is_compiled_with_cuda() else paddle.CPUPlace()
CPU_PLACE = paddle.CPUPlace()


# ============================================================
# Layer 1: Helpers — tensor creation / kernel invocation / output extraction
# ============================================================

def to_paddle_inputs(inputs: Dict[str, Any]) -> Dict[str, Any]:
"""Convert numpy dict → paddle tensors. CPU tensors must be explicitly handled."""
paddle_inputs = {}
for k, v in inputs.items():
if isinstance(v, (int, bool, float, str)):
paddle_inputs[k] = v
elif k in ("<CPU_TENSOR_NAMES>",): # <-- tensors the kernel expects on CPU
paddle_inputs[k] = paddle.to_tensor(v, place=CPU_PLACE)
elif v is not None:
paddle_inputs[k] = paddle.to_tensor(v, place=CUDA_PLACE)
else:
paddle_inputs[k] = None
return paddle_inputs

def run_kernel(paddle_inputs, inputs):
"""Call the CUDA kernel with paddle tensors + scalar attrs."""
kernel_name(
paddle_inputs["tensor_a"],
# ... all tensor args ...
inputs["scalar_attr"], # scalar attrs from raw dict
)

def get_outputs(paddle_inputs) -> Dict[str, np.ndarray]:
"""Extract ALL in-place-modified tensors back to numpy."""
keys = ["tensor_a", "tensor_b", ...]
return {k: paddle_inputs[k].numpy() for k in keys}


# ============================================================
# Layer 2: Input generation
# ============================================================

def gen_<kernel>_inputs(real_bsz=8, ..., seed=42) -> Dict[str, Any]:
"""Generate randomized test inputs. Returns dict with both numpy arrays and scalar configs."""
rng = np.random.default_rng(seed)
# ... generate all numpy arrays with correct dtypes/shapes ...
return { "tensor_a": ..., "scalar_attr": ..., "real_bsz": real_bsz, ... }


# ============================================================
# Layer 3: Reference implementation (pure Python/NumPy)
# ============================================================

def reference_<kernel>(inputs: Dict[str, Any]) -> Dict[str, Any]:
"""Python reference — must match CUDA kernel logic exactly."""
# Deep-copy all mutable arrays
tensor_a = inputs["tensor_a"].copy()
# ... replicate kernel logic ...
return {"tensor_a": tensor_a, ...}


# ============================================================
# Layer 4a: TEST_CONFIGS — all pure-parameter test scenarios
# ============================================================

TEST_CONFIGS = [
# Each config is a dict of gen_<kernel>_inputs kwargs + a "name" key.
# Pure parameter variations go here — do NOT create separate test methods for them.
#
# --- basic coverage ---
{"name": "small_batch", "real_bsz": 1, "seed": 42, ...},
{"name": "large_batch", "real_bsz": 64, "seed": 42, ...},
# --- mode / strategy variants ---
{"name": "mode_a", "real_bsz": 8, "mode": "a", "seed": 42, ...},
{"name": "mode_b", "real_bsz": 8, "mode": "b", "seed": 42, ...},
# --- flags ---
{"name": "reject_all", "real_bsz": 8, "reject_all": True, "seed": 42, ...},
{"name": "accept_all", "real_bsz": 8, "accept_all": True, "seed": 42, ...},
# --- edge cases ---
{"name": "min_batch", "real_bsz": 1, "max_tokens": 1, "seed": 42, ...},
]


# ============================================================
# Layer 4b: Test suite
# ============================================================

class Test<KernelName>(unittest.TestCase):

# ------ shared helpers ------

def _run_and_get(self, inputs):
paddle_inputs = to_paddle_inputs(inputs)
run_kernel(paddle_inputs, inputs)
return get_outputs(paddle_inputs)

def _check_all_outputs(self, inputs, outputs):
"""Compare ALL output tensors against reference + sanity checks."""
ref = reference_<kernel>(inputs)
all_keys = ["tensor_a", "tensor_b", ...]
for key in all_keys:
np.testing.assert_array_equal(
outputs[key], ref[key], err_msg=f"{key} mismatch"
)
# Add domain-specific sanity checks here

def _run_full_test(self, config):
inputs = gen_<kernel>_inputs(**config)
outputs = self._run_and_get(inputs)
self._check_all_outputs(inputs, outputs)
return outputs

# ------ test cases ------

def test_configs(self):
"""Run all TEST_CONFIGS via subTest (one subTest per config)."""
for cfg in TEST_CONFIGS:
with self.subTest(name=cfg["name"]):
test_cfg = {k: v for k, v in cfg.items() if k != "name"}
self._run_full_test(test_cfg)

# Only keep separate test methods for scenarios that need tensor overrides:
def test_special_scenario(self):
"""Scenarios that need manual tensor setup beyond gen_inputs params."""
inputs = gen_<kernel>_inputs(real_bsz=2, seed=42)
inputs["some_tensor"][0, 2] = special_value # override specific tensor
outputs = self._run_and_get(inputs)
self._check_all_outputs(inputs, outputs)

if __name__ == "__main__":
unittest.main()
```

## Key Rules

1. **CPU vs GPU tensors**: Read the CUDA kernel `.cu` file carefully. If a tensor is `copy_to(place, false)` inside the host function, it's a CPU tensor input — must use `CPU_PLACE` in `to_paddle_inputs`.
2. **`_check_all_outputs` checks ALL tensors**: Every in-place-modified output tensor must be compared against reference. Never scatter `assertEqual`/`assertTrue` across individual test methods — all checks go through `_check_all_outputs`.
3. **Stochastic kernels**: If the kernel uses `curand` (e.g., top-p sampling), compare only deterministic positions. Skip the last sampled token in `compare_results`. Note: `curand_states` in reference should be sized to `max_step_tokens` (position count), not `bsz` (batch count).
4. **TEST_CONFIGS for pure-parameter scenarios**: Any test that only differs by `gen_inputs` parameters belongs in `TEST_CONFIGS`, not a separate `test_*` method. Only create separate methods when you need to **override specific tensor values** after generation.
5. **Test cases are thin**: Each `test_*` method should be 3-15 lines. It either calls `_run_full_test(config)` or does `gen → override → _run_and_get → _check_all_outputs`.
6. **No `fastdeploy.__init__`**: Import ops via `import_custom_ops` directly to avoid heavy dependency chain.
7. **Padding slots**: Kernel may have `max_bsz > real_bsz`. Reference impl must handle padding slots the same way as the kernel (typically no-op or stop_count++).
2 changes: 1 addition & 1 deletion .flake8
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[flake8]
ignore = E203, E402, E501, E731, E741, W503, W605, E722
ignore = E203, E402, E501, E731, E741, W503, W605, E722, E231, W604, E702, E226, E221, E713, E271
max-line-length = 119

# E402: module level import not at top of file
Expand Down
30 changes: 30 additions & 0 deletions .github/actions/rerun-workflow/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
name: 'Rerun Workflow'
description: 'Re-run GitHub Actions workflow for a given Pull Request'
inputs:
GITHUB_TOKEN:
description: 'GitHub token with repo scope'
required: true
OWNER:
description: 'Repository owner'
required: true
REPO:
description: 'Repository name'
required: true
PR_ID:
description: 'Pull Request ID'
required: true
JOB_NAME:
description: 'Job name to rerun'
required: true

runs:
using: 'composite'
steps:
- run: bash ./.github/actions/rerun-workflow/rerun.sh
shell: bash
env:
GITHUB_TOKEN: ${{ inputs.GITHUB_TOKEN }}
OWNER: ${{ inputs.OWNER }}
REPO: ${{ inputs.REPO }}
PR_ID: ${{ inputs.PR_ID }}
JOB_NAME: ${{ inputs.JOB_NAME }}
77 changes: 77 additions & 0 deletions .github/actions/rerun-workflow/rerun.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# Copyright (c) 2025 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

set -e

COMMIT_SHA=$(curl -s -H "Authorization: token $GITHUB_TOKEN" \
"https://api.github.com/repos/$OWNER/$REPO/pulls/$PR_ID" | jq -r '.head.sha')

echo "Commit SHA: $COMMIT_SHA"

response=$(curl -s -H "Authorization: token $GITHUB_TOKEN" \
"https://api.github.com/repos/$OWNER/$REPO/actions/runs?head_sha=$COMMIT_SHA&per_page=100")

echo "Response: $response"

run_ids=$(echo "$response" | jq -r '.workflow_runs[].id')

if [ -n "$run_ids" ]; then
echo "Found run_ids for commit $COMMIT_SHA: $run_ids"

for run_id in $run_ids; do
if [ "$JOB_NAME" = "all-failed" ]; then
echo "Rerunning all failed jobs for run_id: $run_id"

rerun_response=$(curl -X POST -s -w "%{http_code}" -o /dev/null \
-H "Accept: application/vnd.github.v3+json" \
-H "Authorization: Bearer $GITHUB_TOKEN" \
"https://api.github.com/repos/$OWNER/$REPO/actions/runs/$run_id/rerun-failed-jobs")
if [ "$rerun_response" -eq 201 ]; then
echo "Successfully requested rerun for all blocked jobs in run_id: $run_id"
else
echo "Failed to request rerun for run_id: $run_id with status code $rerun_response"
fi

else
jobs_response=$(curl -s -H "Authorization: token $GITHUB_TOKEN" \
"https://api.github.com/repos/$OWNER/$REPO/actions/runs/$run_id/jobs")

echo "Jobs Response for run_id $run_id: $jobs_response"

# if [[ "$JOB_NAME" == *"bypass"* ]]; then
block_jobs=$(echo "$jobs_response" | jq -r --arg job_name "$JOB_NAME" \
'.jobs[] | select(.name == $job_name) | .id')
# else
# block_jobs=$(echo "$jobs_response" | jq -r --arg job_name "$JOB_NAME" \
# '.jobs[] | select(.name == $job_name and .conclusion != "success") | .id')
# fi

if [ -n "$block_jobs" ]; then
echo "Found block jobs for run_id $run_id: $block_jobs"

for job_id in $block_jobs; do
echo "Rerunning job_id: $job_id"
curl -X POST -H "Accept: application/vnd.github.v3+json" \
-H "Authorization: token $GITHUB_TOKEN" \
"https://api.github.com/repos/$OWNER/$REPO/actions/jobs/$job_id/rerun"
done
else
echo "No block jobs found for run_id $run_id with name $JOB_NAME."
fi
fi
done
else
echo "No matching workflow runs found for commit $COMMIT_SHA."
exit 1
fi
54 changes: 54 additions & 0 deletions .github/copilot-instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# GitHub Copilot Custom Review Instructions

When reviewing code, focus on:

## Security Critical Issues
- Check for hardcoded secrets, API keys, or credentials
- Look for SQL injection and XSS vulnerabilities
- Verify proper input validation and sanitization
- Review authentication and authorization logic

## Performance Red Flags
- Identify N+1 database query problems
- Spot inefficient loops and algorithmic issues
- Check for memory leaks and resource cleanup
- Review caching opportunities for expensive operations

## Code Quality Essentials
- Functions should be focused and appropriately sized
- Use clear, descriptive naming conventions
- Ensure proper error handling throughout

## Review Style
- Be specific and actionable in feedback
- Explain the "why" behind recommendations
- Acknowledge good patterns when you see them
- Ask clarifying questions when code intent is unclear

Always prioritize security vulnerabilities and performance issues that could impact users.

Always suggest changes to improve readability. For example, this suggestion seeks to make the code more readable and also makes the validation logic reusable and testable.

// Instead of:
if (user.email && user.email.includes('@') && user.email.length > 5) {
submitButton.enabled = true;
} else {
submitButton.enabled = false;
}

// Consider:
function isValidEmail(email) {
return email && email.includes('@') && email.length > 5;
}

submitButton.enabled = isValidEmail(user.email);

## Description for pull request

- Please check the title of the Pull Request. It needs to follow the format of [CLASS]Title, for example, [BugFix] Fix memory leak of data processor. If the title is incorrect, provide suggestions on how the committer should modify it.
- Please check the description information of the Pull Request. At a minimum, it should explain why these modifications are being made in this Pull Request and what problem is being solved. If the committer hasn't written the corresponding information or the information is incomplete, prompt the committer to make modifications.
- For all Pull Requests, please confirm whether it is necessary to add, update, or delete documentation, and remind the committer to handle it accordingly.

## Others
- 对于所有提交的PR,你提交的评论都使用中文语言,但需要注意,代码中的注释仍然需要使用英文
- 在你提交Pull Request时,需要注意代码风格要满足本Repo的设定,commit代码前需要`pip install pre-commit==4.2.0`并且执行`pre-commit install`
34 changes: 34 additions & 0 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
<!-- TemplateReference: https://github.com/PaddlePaddle/FastDeploy/blob/develop/.github/pull_request_template.md -->

<!-- Thank you for your contribution! Please follow these guidelines to enhance your pull request. If anything is unclear, submit your PR and reach out to maintainers for assistance. -->

## Motivation

<!-- Describe the purpose and goals of this pull request. -->

> :bulb: If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

> :bulb: 如若此PR是Cherry Pick,PR标题需遵循格式,在最开始加上[Cherry-Pick]标签,以及最后面加上原PR ID,例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

## Modifications

<!-- Detail the changes made in this pull request. -->

## Usage or Command

<!-- You should provide the usage if this pr is about the new function. -->
<!-- You should provide the command to run if this pr is about the performance optimization or fixing bug. -->

## Accuracy Tests

<!-- If this pull request affects model outputs (e.g., changes to the kernel or model forward code), provide accuracy test results. -->

## Checklist

- [ ] Add at least a tag in the PR title.
- Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
- You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.
Loading
Loading