[None][infra] Support nv sa benchmark in CI Perf Test#13004
[None][infra] Support nv sa benchmark in CI Perf Test#13004chenfeiz0326 wants to merge 1 commit intoNVIDIA:mainfrom
Conversation
📝 WalkthroughWalkthroughTwo test utility files are updated: OpenSearch match logic now treats missing boolean fields as Changes
Sequence Diagram(s)sequenceDiagram
participant Test as Test Execution
participant Config as ClientConfig
participant Repo as ensure_bench_serving_repo()
participant Benchmark as benchmark_serving.py
participant Parser as get_perf_result()
Test->>Config: Initialize with use_nv_sa_benchmark=true
Config->>Config: to_cmd() called
alt use_nv_sa_benchmark is true
Config->>Repo: Request benchmark repo path
Repo->>Repo: Clone bench_serving if needed
Repo-->>Config: Return benchmark_serving.py path
Config->>Benchmark: Generate SA benchmark command
Benchmark-->>Config: Command with SA arguments
else use_nv_sa_benchmark is false
Config->>Config: _to_default_benchmark_cmd()
end
Config-->>Test: Return benchmark command
Test->>Benchmark: Execute benchmark
Benchmark-->>Parser: Produce metrics (may lack user_throughput)
Parser->>Parser: Enumerate outputs by client_idx
alt user_throughput missing in SA benchmark
Parser->>Parser: Inject user_throughput = 0.0
end
Parser-->>Test: Return complete metrics
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
tests/integration/defs/perf/test_perf_sanity.py (1)
1538-1548: Consider: Using 0.0 as placeholder for missinguser_throughputmay be misleading.When SA benchmark doesn't output
user_throughput, injecting0.0allows the test to pass but may be misleading in the database. A value of0.0could be confused with actual zero throughput.Consider either:
- Using
Noneand adjusting downstream validation to handle it, or- Adding a comment/field indicating the metric was not measured
💡 Alternative: Track that metric was unavailable
# SA benchmark (bench_serving) doesn't output user_throughput if ( metrics and "user_throughput" not in metrics and client_idx < len(client_configs) and client_configs[client_idx].use_nv_sa_benchmark ): - metrics["user_throughput"] = 0.0 + # Placeholder: SA benchmark doesn't report this metric + metrics["user_throughput"] = -1.0 # or use None if downstream supports it🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/integration/defs/perf/test_perf_sanity.py` around lines 1538 - 1548, The current loop in which parse_metrics_from_output is used should not inject a misleading 0.0 for missing user_throughput; instead, set metrics["user_throughput"] = None and add an explicit indicator like metrics["user_throughput_measured"] = False (when client_configs[client_idx].use_nv_sa_benchmark is true and user_throughput is absent) so consumers can distinguish "not measured" from a real zero, and update any downstream validation/aggregation that reads _perf_results to treat None or the measured flag appropriately; reference parse_metrics_from_output, _perf_results, client_configs and use_nv_sa_benchmark when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/integration/defs/perf/test_perf_sanity.py`:
- Around line 475-502: The SA benchmark builder _to_sa_benchmark_cmd() currently
hardcodes flags (--use-chat-template, --trust-remote-code) and ignores
configuration attributes self.backend and self.streaming; update
_to_sa_benchmark_cmd() to mirror the default benchmark’s behavior by
conditionally adding --use-chat-template and --trust-remote-code only when the
corresponding object attributes/settings require them, and include logic to
incorporate self.backend and self.streaming into the generated command (e.g.,
add the appropriate --backend value and enable/disable streaming flags) so SA
runs respect the same config keys as the default benchmark (ensure you reference
_to_sa_benchmark_cmd(), self.backend, self.streaming, --use-chat-template, and
--trust-remote-code when making changes).
- Around line 64-75: Replace the unsafe clone in ensure_bench_serving_repo by
pinning BENCH_SERVING_REPO to a specific commit SHA (add BENCH_SERVING_COMMIT)
and perform a shallow clone followed by an explicit git checkout to that SHA;
use tempfile.mkdtemp() instead of the hardcoded BENCH_SERVING_DIR to create an
isolated dir and avoid symlink/TOCTOU issues, and update logic around
BENCH_SERVING_DIR and bench_script to use that temp dir (refer to symbols
BENCH_SERVING_REPO, BENCH_SERVING_COMMIT, BENCH_SERVING_DIR,
ensure_bench_serving_repo, and bench_script). Ensure the subprocess calls use
["git", "clone", "--depth", "1", "--single-branch", BENCH_SERVING_REPO,
temp_dir] then ["git", "fetch", "--depth=1", "origin", BENCH_SERVING_COMMIT] or
["git", "checkout", BENCH_SERVING_COMMIT] as needed, and remove any
unconditional trust of branch names.
---
Nitpick comments:
In `@tests/integration/defs/perf/test_perf_sanity.py`:
- Around line 1538-1548: The current loop in which parse_metrics_from_output is
used should not inject a misleading 0.0 for missing user_throughput; instead,
set metrics["user_throughput"] = None and add an explicit indicator like
metrics["user_throughput_measured"] = False (when
client_configs[client_idx].use_nv_sa_benchmark is true and user_throughput is
absent) so consumers can distinguish "not measured" from a real zero, and update
any downstream validation/aggregation that reads _perf_results to treat None or
the measured flag appropriately; reference parse_metrics_from_output,
_perf_results, client_configs and use_nv_sa_benchmark when making the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: e2f94a79-22d1-4770-91cc-780c08769306
📒 Files selected for processing (2)
tests/integration/defs/perf/open_search_db_utils.pytests/integration/defs/perf/test_perf_sanity.py
928d56d to
5d871d1
Compare
|
/bot run --disable-fail-fast --stage-list "GB200-4_GPUs-PyTorch-PerfSanity-1,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-1,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-2,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-3,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-4,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-5,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-6,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-7,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-2,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-3,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-4,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-5,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-6,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-7,GB200-8_GPUs-2_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU1-GEN1-NODE1-GPU2-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE1-GPU4-Post-Merge-1,GB200-12_GPUs-3_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU1-GEN1-NODE2-GPU8-Post-Merge-1,GB200-16_GPUs-4_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE2-GPU8-GEN1-NODE2-GPU8-Post-Merge-1,GB200-24_GPUs-6_Nodes-PyTorch-Disagg-PerfSanity-CTX2-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-1" |
|
PR_Github #43173 [ run ] triggered by Bot. Commit: |
|
PR_Github #43173 [ run ] completed with state
|
5d871d1 to
09f6d2e
Compare
|
/bot run --disable-fail-fast --stage-list "GB200-4_GPUs-PyTorch-PerfSanity-1,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-1,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-2,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-3,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-4,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-5,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-6,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-7,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-2,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-3,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-4,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-5,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-6,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-7,GB200-8_GPUs-2_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU1-GEN1-NODE1-GPU2-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE1-GPU4-Post-Merge-1,GB200-12_GPUs-3_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU1-GEN1-NODE2-GPU8-Post-Merge-1,GB200-16_GPUs-4_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE2-GPU8-GEN1-NODE2-GPU8-Post-Merge-1,GB200-24_GPUs-6_Nodes-PyTorch-Disagg-PerfSanity-CTX2-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-1" |
|
PR_Github #43245 [ run ] triggered by Bot. Commit: |
|
PR_Github #43245 [ run ] completed with state |
Summary by CodeRabbit
New Features
Bug Fixes
Description
Test Coverage
PR Checklist
Please review the following before submitting your PR:
PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.
GitHub Bot Help
To see a list of available CI bot commands, please comment
/bot help.