Closed
Conversation
…dlePaddle#6407) * Optim GPU Mem Usage --------- Co-authored-by: huzesen <huzesen@baidu.com>
…addlePaddle#6541) * fix mtp acceptance rate decline * [BugFix][Scheduler] Fix can_schedule_block_num_threshold calculation Fix the calculation of can_schedule_block_num_threshold in ResourceManagerV1. The original formula using need_prefill_tokens could lead to incorrect threshold values. Now directly use num_chunk_new_block for accurate block scheduling. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* [Docs] Update code overview documentation - Add comprehensive FastDeploy code structure overview - Include detailed module descriptions and development guides - Add quick development guide for common tasks - Update both English and Chinese versions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [Docs] Update code overview documentation format - Convert file path links from [file](path) to `file` inline code format - Add proper spacing for better readability in markdown tables - Maintain consistent formatting across English and Chinese docs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…e/fused_moe_cutlass_backend.py 单测补充 (PaddlePaddle#6209) * [CI] 【Hackathon 10th Spring No.24】功能模块 fastdeploy/model_executor/layers/moe/fused_moe_triton_backend.py 单测补充 * [CI] 【Hackathon 10th Spring No.23】fastdeploy/model_executor/layers/moe/fused_moe_cutlass_backend.py 单测补充 * [CI] 【Hackathon 10th Spring No.23】fastdeploy/model_executor/layers/moe/fused_moe_cutlass_backend.py 单测补充 * Merge branch 'develop' into 23 * Merge branch 'develop' into 23 * Merge branch 'develop' into 23 * Merge branch 'develop' into 23 --------- Co-authored-by: Jiaxin Sui <95567040+plusNew001@users.noreply.github.com>
…_fp8 and no storage backend (PaddlePaddle#6516) * [fix] fix cache transfer manager init failed when using block_wise_fp8 and no storage backend * [fix] fix test_cache_transfer_manager * [fix] fix test_cache_transfer_manager again --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
* lazy enable_torch_proxy for cutlass * test init_flash_attn_version
…ocation errors (PaddlePaddle#6531) * [BugFix] Add safety checks in recycle_gpu_blocks to prevent block allocation errors - Check prefix tree status before recycling GPU blocks - Validate gpu_block_ids is a list - Add overflow check to prevent free block count exceeding total blocks Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [BugFix] Fix AttributeError in recycle_gpu_blocks when prefix_tree_status_signal not initialized - Add hasattr check before accessing prefix_tree_status_signal - The signal is only initialized in launch_cache_messager, not in __init__ - Fixes CI test failure in test_prefix_cache_manager.py Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [BugFix] Reset prefix cache when model weights are updating - Call self.reset() before setting status to NORMAL in UPDATING state - Ensure cache consistency when model weights change - Consistent with CLEARING state handling Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…ed_neox_rope_embedding (PaddlePaddle#6553)
…ransfer benchmark tool (PaddlePaddle#6434) * fix cache messager performance problem * dispatch param type
…client.py 单测补充 (PaddlePaddle#6158) * fix codestyle and update unit test coverage workflow * fix test_engine_client.py: add main_process_metrics mock to prevent KeyError * fix test_engine_client.py: comprehensive test improvements * feat: enhance test_engine_client.py with comprehensive test improvements * fix: resolve test failures in test_engine_client.py * test: enhance EngineClient test coverage with comprehensive test suite * test: add comprehensive EngineClient test suite (codestyle checked)
* [BugFix] Fix mtp when token_ids_all is None * fix bug
…x_cache_manager.py单测补充 (PaddlePaddle#6297) * test: update prefix cache manager tests * test: refine prefix cache manager coverage helpers * style: apply black formatting to test_prefix_cache_manager.py Co-authored-by: Cursor <cursoragent@cursor.com> * tests: update test_prefix_cache_manager Co-authored-by: Cursor <cursoragent@cursor.com> * update --------- Co-authored-by: Cursor <cursoragent@cursor.com>
…lePaddle#6501) * add speculate_pre_process kernel * reduce one slice * make d2h async && fix mtp bug for new pre_process * fix * add unitest * fix: code stype formatting * fix * fix: thread race in speculate_preprocess && rename d2h event
…addle#6812) * reformat eagle_get_hidden_states & eagle_get_self_hidden_states * readibility * fix xpu bug * fix coverage failure * change luanch params & parallelize position_map compute * Fix MTP-related bugs in FastDeploy centralized inference * fix * refactor mtp hidden_states process * fix * add unittest & optimize kernel * remove useless code * fix
* [CE]add 21b cpu cache ,glm mtp,glm for rl config * [CE]add 21b tp2 yaml * [CE]add 21b mooncake yaml * add fastdeploy benchmark,paddletest-155 * [CE] adjust vl wint4 config * [CE]add glm mtp with updatemodel config * [CE]fix * fix * test * test * test --------- Co-authored-by: xiegegege <>
* fix xpu ci bug * Remove unnecessary blank line in conftest.py * Update upload-artifact action to version 6 * Update _xpu_8cards_case_test.yml * fix ci bug * Change exit code on test failure to 1 * fix ci bug * fix ci bug * fix ci bug * fix ci bug * Update conftest.py
* [CI]【Hackathon 10th Spring No.43】ernie4_5_mtp 单测补充 * [CI]【Hackathon 10th Spring No.43】add mapping and forward branch coverage --------- Co-authored-by: cloudforge1 <cloudforge1@users.noreply.github.com> Co-authored-by: CSWYF3634076 <wangyafeng@baidu.com> Co-authored-by: YuBaoku <49938469+EmmonsCurse@users.noreply.github.com>
* [BugFix] xpu fix speculate schedule cache kernel * fix code style
…Paddle#7050) Removed the --ipc=host option from the docker run command.
…addlePaddle#7048) * [Feature] Support --skip-mm-profiling to skip multimodal token overhead in profiling ## Motivation 在多模态模型(如 Qwen2.5-VL、ERNIE4.5-VL 等)部署时,`get_max_chunk_tokens` 会在 基础 token 数之上额外叠加 mm token 数,用于 profiling 阶段预留显存。 某些场景下(如已知图像 token 数较小,或希望节省显存),用户希望跳过该多模态 token 额外开销的计算,直接使用文本 token 数进行 profiling。 ## Modifications - `fastdeploy/engine/args_utils.py`:`EngineArgs` 新增 `skip_mm_profiling: bool = False` 字段,parser 新增 `--skip-mm-profiling` 启动参数 - `fastdeploy/config.py`:`ModelConfig.__init__` 新增 `self.skip_mm_profiling = False`; `FDConfig.get_max_chunk_tokens` 中增加 `not self.model_config.skip_mm_profiling` 判断, 开启后跳过 mm token 叠加,直接返回基础 `num_tokens` ## Usage or Command 启动服务时添加参数: ```bash --skip-mm-profiling ``` ## Checklist - [x] Add at least a tag in the PR title. - [x] Format your code, run `pre-commit` before commit. - [ ] Add unit tests. 本功能为配置参数透传,逻辑简单,已有相关 config 单元测试覆盖。 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [Refactor] Replace skip_mm_profiling with deploy_modality=text to skip mm profiling ## Motivation 原 `--skip-mm-profiling` 参数与已有的 `deploy_modality` 参数功能存在语义重叠: 当以纯文本模式(`deploy_modality=text`)部署时,本就不需要为多模态 token 预留显存。 引入独立参数增加了配置复杂度,复用 `deploy_modality` 更加直观和一致。 ## Modifications - `fastdeploy/engine/args_utils.py`:删除 `EngineArgs.skip_mm_profiling` 字段及 `--skip-mm-profiling` 启动参数 - `fastdeploy/config.py`:删除 `ModelConfig.__init__` 中的 `self.skip_mm_profiling = False`; `FDConfig.get_max_chunk_tokens` 中将条件改为 `self.deploy_modality != DeployModality.TEXT`, 当 deploy_modality 为 text 时直接返回 `max_num_batched_tokens`,跳过 mm token 叠加 ## Usage or Command ```bash # 以文本模式部署,跳过 mm token profiling 开销(替代原 --skip-mm-profiling) python -m fastdeploy.entrypoints.openai.api_server \ --deploy-modality text \ --model /path/to/model \ ... ``` ## Checklist - [x] Add at least a tag in the PR title. - [x] Format your code, run `pre-commit` before commit. - [ ] Add unit tests. 本次为参数重构,逻辑等价替换,已有 config 单元测试覆盖。 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* add cute cpp fa4 * 删掉注释 * 修正合并错误 * sm_version放到函数内 * ci错误
…end (PaddlePaddle#7028) * [BugFix] Fix kv cache int8 dynamic quant on flash and flash_mask backend * add constexpr and code style clean * add test * fix code style * fix test
…addlePaddle#7046) * Add lock to avoid generating nan * up
…cache (#…" (PaddlePaddle#7075) This reverts commit 6d2ab8f.
* merge text processor * update * fix unit test * merge messages2ids * fix unit test * 删除重复代码 * remove redundant code * delete code * fix unit test
…xtra_keys (PaddlePaddle#6929) * [BugFix][KVCache] Fix mm hash boundary comparison in get_block_hash_extra_keys Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [BugFix][KVCache] Fix test_get_block_hash_extra_keys_boundary_cases assertions ## Motivation 测试用例 `test_get_block_hash_extra_keys_boundary_cases` 中,Block [4,8) 的 调用错误地传入了 `mm_idx=1`,跳过了 img0[2,5);但 img0 覆盖 token 4,token 4 属于 block [4,8),应被包含在 hash_keys 中。此外,所有 assertEqual 只校验了 hash_keys,未校验返回的 mm_idx 游标。 ## Modifications - `test_get_block_hash_extra_keys_boundary_cases`: - 改为链式调用,用上一次返回的 mm_idx 作为下一次入参,模拟真实调用循环 - Block [4,8) 入参从 `mm_idx=1` 改为沿用上次返回的 `mm_idx=0`,期望值从 `[]` 改为 `["hash-0"]` - 所有断言改为 `assertEqual((mm_idx, hash_keys), (...))` 同时校验游标 - `test_get_block_hash_extra_keys_no_overlap_at_boundaries`: - Case B 入参从 `mm_idx=1` 改为 `mm_idx=0`(从头遍历,img-a 走 continue) - 所有断言增加 mm_idx 校验 - `test_get_block_hash_extra_keys_image_crosses_block_boundary`: - 所有断言增加 mm_idx 校验 - `test_get_block_hash_extra_keys_no_mm_inputs`: - 断言增加 mm_idx 校验 - `test_get_block_hash_extra_keys_handles_multimodal_segments`: - call2、call3 断言增加 mm_idx 校验 ## Usage or Command ```bash python -m pytest tests/cache_manager/test_prefix_cache_manager.py::TestPrefixCacheManagerCoverage -v -k "get_block_hash_extra_keys" ``` Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: chengyanfu <chengyanfu@baidu.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Added debug print statement in post_process_normal function for troubleshooting purposes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Test plan
🤖 Generated with Claude Code