Add debug log for troubleshooting by rainyfly · Pull Request #1 · rainyfly/FastDeploy

rainyfly · 2026-03-30T11:40:41Z

Summary

在 `post_process_normal` 函数中添加调试日志，用于问题排查

Test plan

验证调试日志正常输出
确认不影响原有功能

🤖 Generated with Claude Code

…dlePaddle#6407) * Optim GPU Mem Usage --------- Co-authored-by: huzesen <huzesen@baidu.com>

…addlePaddle#6541) * fix mtp acceptance rate decline * [BugFix][Scheduler] Fix can_schedule_block_num_threshold calculation Fix the calculation of can_schedule_block_num_threshold in ResourceManagerV1. The original formula using need_prefill_tokens could lead to incorrect threshold values. Now directly use num_chunk_new_block for accurate block scheduling. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* [Docs] Update code overview documentation - Add comprehensive FastDeploy code structure overview - Include detailed module descriptions and development guides - Add quick development guide for common tasks - Update both English and Chinese versions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [Docs] Update code overview documentation format - Convert file path links from [file](path) to `file` inline code format - Add proper spacing for better readability in markdown tables - Maintain consistent formatting across English and Chinese docs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

…e/fused_moe_cutlass_backend.py 单测补充 (PaddlePaddle#6209) * [CI] 【Hackathon 10th Spring No.24】功能模块 fastdeploy/model_executor/layers/moe/fused_moe_triton_backend.py 单测补充 * [CI] 【Hackathon 10th Spring No.23】fastdeploy/model_executor/layers/moe/fused_moe_cutlass_backend.py 单测补充 * [CI] 【Hackathon 10th Spring No.23】fastdeploy/model_executor/layers/moe/fused_moe_cutlass_backend.py 单测补充 * Merge branch 'develop' into 23 * Merge branch 'develop' into 23 * Merge branch 'develop' into 23 * Merge branch 'develop' into 23 --------- Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>

…_fp8 and no storage backend (PaddlePaddle#6516) * [fix] fix cache transfer manager init failed when using block_wise_fp8 and no storage backend * [fix] fix test_cache_transfer_manager * [fix] fix test_cache_transfer_manager again --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>

…#6563)

* lazy enable_torch_proxy for cutlass * test init_flash_attn_version

…Paddle#6587)

…ocation errors (PaddlePaddle#6531) * [BugFix] Add safety checks in recycle_gpu_blocks to prevent block allocation errors - Check prefix tree status before recycling GPU blocks - Validate gpu_block_ids is a list - Add overflow check to prevent free block count exceeding total blocks Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [BugFix] Fix AttributeError in recycle_gpu_blocks when prefix_tree_status_signal not initialized - Add hasattr check before accessing prefix_tree_status_signal - The signal is only initialized in launch_cache_messager, not in __init__ - Fixes CI test failure in test_prefix_cache_manager.py Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [BugFix] Reset prefix cache when model weights are updating - Call self.reset() before setting status to NORMAL in UPDATING state - Ensure cache consistency when model weights change - Consistent with CLEARING state handling Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

…ed_neox_rope_embedding (PaddlePaddle#6553)

…ransfer benchmark tool (PaddlePaddle#6434) * fix cache messager performance problem * dispatch param type

…client.py 单测补充 (PaddlePaddle#6158) * fix codestyle and update unit test coverage workflow * fix test_engine_client.py: add main_process_metrics mock to prevent KeyError * fix test_engine_client.py: comprehensive test improvements * feat: enhance test_engine_client.py with comprehensive test improvements * fix: resolve test failures in test_engine_client.py * test: enhance EngineClient test coverage with comprehensive test suite * test: add comprehensive EngineClient test suite (codestyle checked)

* [BugFix] Fix mtp when token_ids_all is None * fix bug

…x_cache_manager.py单测补充 (PaddlePaddle#6297) * test: update prefix cache manager tests * test: refine prefix cache manager coverage helpers * style: apply black formatting to test_prefix_cache_manager.py Co-authored-by: Cursor <cursoragent@cursor.com> * tests: update test_prefix_cache_manager Co-authored-by: Cursor <cursoragent@cursor.com> * update --------- Co-authored-by: Cursor <cursoragent@cursor.com>

…lePaddle#6501) * add speculate_pre_process kernel * reduce one slice * make d2h async && fix mtp bug for new pre_process * fix * add unitest * fix: code stype formatting * fix * fix: thread race in speculate_preprocess && rename d2h event

…addle#6812) * reformat eagle_get_hidden_states & eagle_get_self_hidden_states * readibility * fix xpu bug * fix coverage failure * change luanch params & parallelize position_map compute * Fix MTP-related bugs in FastDeploy centralized inference * fix * refactor mtp hidden_states process * fix * add unittest & optimize kernel * remove useless code * fix

…ddlePaddle#7026)

* [CE]add 21b cpu cache ,glm mtp,glm for rl config * [CE]add 21b tp2 yaml * [CE]add 21b mooncake yaml * add fastdeploy benchmark,paddletest-155 * [CE] adjust vl wint4 config * [CE]add glm mtp with updatemodel config * [CE]fix * fix * test * test * test --------- Co-authored-by: xiegegege <>

* fix xpu ci bug * Remove unnecessary blank line in conftest.py * Update upload-artifact action to version 6 * Update _xpu_8cards_case_test.yml * fix ci bug * Change exit code on test failure to 1 * fix ci bug * fix ci bug * fix ci bug * fix ci bug * Update conftest.py

…ePaddle#7044)

) * use sharemem * B card test * fix acc error

* [CI]【Hackathon 10th Spring No.43】ernie4_5_mtp 单测补充 * [CI]【Hackathon 10th Spring No.43】add mapping and forward branch coverage --------- Co-authored-by: cloudforge1 <cloudforge1@users.noreply.github.com> Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com> Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>

* [BugFix] xpu fix speculate schedule cache kernel * fix code style

…Paddle#7050) Removed the --ipc=host option from the docker run command.

…addlePaddle#7048) * [Feature] Support --skip-mm-profiling to skip multimodal token overhead in profiling ## Motivation 在多模态模型（如 Qwen2.5-VL、ERNIE4.5-VL 等）部署时，`get_max_chunk_tokens` 会在基础 token 数之上额外叠加 mm token 数，用于 profiling 阶段预留显存。某些场景下（如已知图像 token 数较小，或希望节省显存），用户希望跳过该多模态 token 额外开销的计算，直接使用文本 token 数进行 profiling。 ## Modifications - `fastdeploy/engine/args_utils.py`：`EngineArgs` 新增 `skip_mm_profiling: bool = False` 字段，parser 新增 `--skip-mm-profiling` 启动参数 - `fastdeploy/config.py`：`ModelConfig.__init__` 新增 `self.skip_mm_profiling = False`； `FDConfig.get_max_chunk_tokens` 中增加 `not self.model_config.skip_mm_profiling` 判断，开启后跳过 mm token 叠加，直接返回基础 `num_tokens` ## Usage or Command 启动服务时添加参数： ```bash --skip-mm-profiling ``` ## Checklist - [x] Add at least a tag in the PR title. - [x] Format your code, run `pre-commit` before commit. - [ ] Add unit tests. 本功能为配置参数透传，逻辑简单，已有相关 config 单元测试覆盖。 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [Refactor] Replace skip_mm_profiling with deploy_modality=text to skip mm profiling ## Motivation 原 `--skip-mm-profiling` 参数与已有的 `deploy_modality` 参数功能存在语义重叠：当以纯文本模式（`deploy_modality=text`）部署时，本就不需要为多模态 token 预留显存。引入独立参数增加了配置复杂度，复用 `deploy_modality` 更加直观和一致。 ## Modifications - `fastdeploy/engine/args_utils.py`：删除 `EngineArgs.skip_mm_profiling` 字段及 `--skip-mm-profiling` 启动参数 - `fastdeploy/config.py`：删除 `ModelConfig.__init__` 中的 `self.skip_mm_profiling = False`； `FDConfig.get_max_chunk_tokens` 中将条件改为 `self.deploy_modality != DeployModality.TEXT`，当 deploy_modality 为 text 时直接返回 `max_num_batched_tokens`，跳过 mm token 叠加 ## Usage or Command ```bash # 以文本模式部署，跳过 mm token profiling 开销（替代原 --skip-mm-profiling） python -m fastdeploy.entrypoints.openai.api_server \ --deploy-modality text \ --model /path/to/model \ ... ``` ## Checklist - [x] Add at least a tag in the PR title. - [x] Format your code, run `pre-commit` before commit. - [ ] Add unit tests. 本次为参数重构，逻辑等价替换，已有 config 单元测试覆盖。 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* add cute cpp fa4 * 删掉注释 * 修正合并错误 * sm_version放到函数内 * ci错误

…end (PaddlePaddle#7028) * [BugFix] Fix kv cache int8 dynamic quant on flash and flash_mask backend * add constexpr and code style clean * add test * fix code style * fix test

…#6963)

…addlePaddle#7046) * Add lock to avoid generating nan * up

…cache (#…" (PaddlePaddle#7075) This reverts commit 6d2ab8f.

* merge text processor * update * fix unit test * merge messages2ids * fix unit test * 删除重复代码 * remove redundant code * delete code * fix unit test

…xtra_keys (PaddlePaddle#6929) * [BugFix][KVCache] Fix mm hash boundary comparison in get_block_hash_extra_keys Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [BugFix][KVCache] Fix test_get_block_hash_extra_keys_boundary_cases assertions ## Motivation 测试用例 `test_get_block_hash_extra_keys_boundary_cases` 中，Block [4,8) 的调用错误地传入了 `mm_idx=1`，跳过了 img0[2,5)；但 img0 覆盖 token 4，token 4 属于 block [4,8)，应被包含在 hash_keys 中。此外，所有 assertEqual 只校验了 hash_keys，未校验返回的 mm_idx 游标。 ## Modifications - `test_get_block_hash_extra_keys_boundary_cases`： - 改为链式调用，用上一次返回的 mm_idx 作为下一次入参，模拟真实调用循环 - Block [4,8) 入参从 `mm_idx=1` 改为沿用上次返回的 `mm_idx=0`，期望值从 `[]` 改为 `["hash-0"]` - 所有断言改为 `assertEqual((mm_idx, hash_keys), (...))` 同时校验游标 - `test_get_block_hash_extra_keys_no_overlap_at_boundaries`： - Case B 入参从 `mm_idx=1` 改为 `mm_idx=0`（从头遍历，img-a 走 continue） - 所有断言增加 mm_idx 校验 - `test_get_block_hash_extra_keys_image_crosses_block_boundary`： - 所有断言增加 mm_idx 校验 - `test_get_block_hash_extra_keys_no_mm_inputs`： - 断言增加 mm_idx 校验 - `test_get_block_hash_extra_keys_handles_multimodal_segments`： - call2、call3 断言增加 mm_idx 校验 ## Usage or Command ```bash python -m pytest tests/cache_manager/test_prefix_cache_manager.py::TestPrefixCacheManagerCoverage -v -k "get_block_hash_extra_keys" ``` Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: chengyanfu <chengyanfu@baidu.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

Added debug print statement in post_process_normal function for troubleshooting purposes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

ZhangYulongg and others added 30 commits February 28, 2026 14:54

[Benchmark] Update backend_request_func.py (PaddlePaddle#6566)

ce8123c

[CI] Sync mm_batch_invariant with paddle.mm update (PaddlePaddle#6557)

54f7d9f

[Feature] GPU Memory Optimization and Retirement of V0 Scheduler (Pad…

97eee75

…dlePaddle#6407) * Optim GPU Mem Usage --------- Co-authored-by: huzesen <huzesen@baidu.com>

[XPU] support warmup with ep & remove apply_tp_fused_op (PaddlePaddle…

a2072fe

…#6289)

[Benchmark] Update backend_request_func.py (PaddlePaddle#6575)

051bbbe

[CI] Fix tests and docs to resolve failure (PaddlePaddle#6572)

bb51829

[BugFix] fix cache int8 for pd disaggregated deployment (PaddlePaddle…

ea4d10d

…#6563)

[Feature]Supports SWA based on appendattn (PaddlePaddle#6547)

59b578c

fix pfcc deep ep in low latency mode (PaddlePaddle#6440)

7cfb0ff

[BugFix] lazy enable_torch_proxy for cutlass (PaddlePaddle#6523)

5382fb2

* lazy enable_torch_proxy for cutlass * test init_flash_attn_version

[Metax][Fix] fix error based pr#6407 (PaddlePaddle#6584)

16a2a32

seq_lens related tensor shape -> [max_num_seqs] (PaddlePaddle#6535)

d957ccd

[CI] Skip long-sequence case due to potential non-determinism (Paddle…

481d0e3

…Paddle#6587)

[Iluvatar] Support CudaGraph and optimize flash_attn_unpadded and fus…

6674131

…ed_neox_rope_embedding (PaddlePaddle#6553)

[PD Disaggregation] Fix cache messager performance problem & add kv t…

fe0b3a9

…ransfer benchmark tool (PaddlePaddle#6434) * fix cache messager performance problem * dispatch param type

[BugFix] Fix tbo nan (PaddlePaddle#6439)

7bd86f9

more eplb offline load dtypes (PaddlePaddle#6435)

6d83dcc

[BugFix] Fix mtp when token_ids_all is None (PaddlePaddle#6591)

344db8c

* [BugFix] Fix mtp when token_ids_all is None * fix bug

[Metax][Fix] fix ci error based pr#6535 (PaddlePaddle#6600)

3cf7c6c

[BugFix] fix bug when seq_lens_this_time is 2D (PaddlePaddle#6613)

33d6d24

support dsv3 use flashmla (PaddlePaddle#6593)

3cc0941

weight only quant method support QKVGate_proj (PaddlePaddle#6612)

1cae7a0

huicongyao and others added 29 commits March 25, 2026 22:54

[BugFix] Fix RDMA initializes failed (PaddlePaddle#7025)

3c9fd81

[fix] remove all gather ep group control requests in normal cases (Pa…

4425142

…ddlePaddle#7026)

add completion_tokens default (PaddlePaddle#7032)

14b17c0

[CI] update mtp case (PaddlePaddle#7031)

a31d4bf

[CI] disable tests/e2e/test_Qwen3VLMoe_serving.py in unit_test (Paddl…

10c59f7

…ePaddle#7044)

[Feature] Update error logging (PaddlePaddle#7045)

6c24f19

[BugFix] fix clear_parameters in draft cudagraph (PaddlePaddle#7035)

6693bcd

[Optimization] optimize fused_swiglu_fp8_quant_kernel (PaddlePaddle#7007

8ff8236

) * use sharemem * B card test * fix acc error

[XPU] Fix speculate schedule (PaddlePaddle#7049)

bf8e9bf

* [BugFix] xpu fix speculate schedule cache kernel * fix code style

[CI] Update docker run command in unit test coverage workflow (Paddle…

f25760f

…Paddle#7050) Removed the --ipc=host option from the docker run command.

[CI] Align with Paddle layer_norm kernel update (PaddlePaddle#7056)

842c608

[CI] Adapt to codecov action changes for Node.js 24 (PaddlePaddle#7064)

a7cbe3f

[Feature] Support cute cpp Encoder FA4 (PaddlePaddle#7016)

7a20eae

* add cute cpp fa4 * 删掉注释 * 修正合并错误 * sm_version放到函数内 * ci错误

[BugFix] Fix kv cache int8 dynamic quant on flash and flash_mask back…

2eea6fa

…end (PaddlePaddle#7028) * [BugFix] Fix kv cache int8 dynamic quant on flash and flash_mask backend * add constexpr and code style clean * add test * fix code style * fix test

[Feature] Update logging (PaddlePaddle#7072)

61a9079

[Feature] Support NVFP4 Flashinfer-cutedsl MoE on SM100 (PaddlePaddle…

1a1d048

…#6963)

fix bug in cudagraph (PaddlePaddle#7069)

5c60e2f

[BugFix] Add lock to avoid generating nan when using storage cache (P…

6d2ab8f

…addlePaddle#7046) * Add lock to avoid generating nan * up

Revert "[BugFix] Add lock to avoid generating nan when using storage …

1670b01

…cache (#…" (PaddlePaddle#7075) This reverts commit 6d2ab8f.

[Optimization]Merge Text processor (PaddlePaddle#7030)

b9f8873

* merge text processor * update * fix unit test * merge messages2ids * fix unit test * 删除重复代码 * remove redundant code * delete code * fix unit test

[append attention] clean code (PaddlePaddle#7062)

76cf5e9

[Iluvatar] Support wi4a16 group_gemm (PaddlePaddle#7078)

8789329

Add debug log for troubleshooting

e33bacd

Added debug print statement in post_process_normal function for troubleshooting purposes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

rainyfly closed this Mar 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add debug log for troubleshooting#1

Add debug log for troubleshooting#1
rainyfly wants to merge 2181 commits intodevelopfrom
test_cc_for_commit

rainyfly commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Conversation

rainyfly commented Mar 30, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants