Skip to content

[None][infra] Support nv sa benchmark in CI Perf Test#13004

Open
chenfeiz0326 wants to merge 1 commit intoNVIDIA:mainfrom
chenfeiz0326:chenfeiz/add-nv-sa-benchmark-in-submitpy
Open

[None][infra] Support nv sa benchmark in CI Perf Test#13004
chenfeiz0326 wants to merge 1 commit intoNVIDIA:mainfrom
chenfeiz0326:chenfeiz/add-nv-sa-benchmark-in-submitpy

Conversation

@chenfeiz0326
Copy link
Copy Markdown
Collaborator

@chenfeiz0326 chenfeiz0326 commented Apr 13, 2026

Summary by CodeRabbit

  • New Features

    • Added configuration option to support alternative performance benchmarking tool.
  • Bug Fixes

    • Improved backward compatibility for historical data matching when records lack recently-added boolean fields.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@chenfeiz0326 chenfeiz0326 requested a review from a team as a code owner April 13, 2026 14:48
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 13, 2026

📝 Walkthrough

Walkthrough

Two test utility files are updated: OpenSearch match logic now treats missing boolean fields as False when comparing history and new data; simultaneously, support for an external SA benchmark is added via repository cloning and ClientConfig extension with conditional benchmark command generation.

Changes

Cohort / File(s) Summary
OpenSearch Match Logic
tests/integration/defs/perf/open_search_db_utils.py
Modified _match() to treat missing/None boolean fields (keys prefixed with b_) as False before comparison. Updated get_history_data() OpenSearch query construction to match documents where boolean fields are explicitly False or missing using bool queries with must_not exists clauses.
SA Benchmark Support
tests/integration/defs/perf/test_perf_sanity.py
Added ensure_bench_serving_repo() function and module constants to clone external bench_serving repository. Extended ClientConfig with use_nv_sa_benchmark boolean field. Modified to_cmd() to conditionally generate SA benchmark or default tensorrt_llm commands. Updated disaggregated config parsing to propagate benchmark settings. Enhanced performance metric parsing to inject zero user_throughput for SA benchmark clients when absent.

Sequence Diagram(s)

sequenceDiagram
    participant Test as Test Execution
    participant Config as ClientConfig
    participant Repo as ensure_bench_serving_repo()
    participant Benchmark as benchmark_serving.py
    participant Parser as get_perf_result()
    
    Test->>Config: Initialize with use_nv_sa_benchmark=true
    Config->>Config: to_cmd() called
    alt use_nv_sa_benchmark is true
        Config->>Repo: Request benchmark repo path
        Repo->>Repo: Clone bench_serving if needed
        Repo-->>Config: Return benchmark_serving.py path
        Config->>Benchmark: Generate SA benchmark command
        Benchmark-->>Config: Command with SA arguments
    else use_nv_sa_benchmark is false
        Config->>Config: _to_default_benchmark_cmd()
    end
    Config-->>Test: Return benchmark command
    Test->>Benchmark: Execute benchmark
    Benchmark-->>Parser: Produce metrics (may lack user_throughput)
    Parser->>Parser: Enumerate outputs by client_idx
    alt user_throughput missing in SA benchmark
        Parser->>Parser: Inject user_throughput = 0.0
    end
    Parser-->>Test: Return complete metrics
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is incomplete. It contains only the template structure with no substantive content in any required sections: Description, Test Coverage, and PR Checklist items are empty. Please fill in the Description section explaining what changes were made and why. Add Test Coverage section listing relevant tests. Complete the PR Checklist by checking appropriate items and providing additional context as needed.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main change: adding support for the NVIDIA SA benchmark in CI performance tests, which aligns with the changes adding use_nv_sa_benchmark flag and benchmark_serving.py integration.
Docstring Coverage ✅ Passed Docstring coverage is 83.33% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
tests/integration/defs/perf/test_perf_sanity.py (1)

1538-1548: Consider: Using 0.0 as placeholder for missing user_throughput may be misleading.

When SA benchmark doesn't output user_throughput, injecting 0.0 allows the test to pass but may be misleading in the database. A value of 0.0 could be confused with actual zero throughput.

Consider either:

  1. Using None and adjusting downstream validation to handle it, or
  2. Adding a comment/field indicating the metric was not measured
💡 Alternative: Track that metric was unavailable
                 # SA benchmark (bench_serving) doesn't output user_throughput
                 if (
                     metrics
                     and "user_throughput" not in metrics
                     and client_idx < len(client_configs)
                     and client_configs[client_idx].use_nv_sa_benchmark
                 ):
-                    metrics["user_throughput"] = 0.0
+                    # Placeholder: SA benchmark doesn't report this metric
+                    metrics["user_throughput"] = -1.0  # or use None if downstream supports it
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/integration/defs/perf/test_perf_sanity.py` around lines 1538 - 1548,
The current loop in which parse_metrics_from_output is used should not inject a
misleading 0.0 for missing user_throughput; instead, set
metrics["user_throughput"] = None and add an explicit indicator like
metrics["user_throughput_measured"] = False (when
client_configs[client_idx].use_nv_sa_benchmark is true and user_throughput is
absent) so consumers can distinguish "not measured" from a real zero, and update
any downstream validation/aggregation that reads _perf_results to treat None or
the measured flag appropriately; reference parse_metrics_from_output,
_perf_results, client_configs and use_nv_sa_benchmark when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/integration/defs/perf/test_perf_sanity.py`:
- Around line 475-502: The SA benchmark builder _to_sa_benchmark_cmd() currently
hardcodes flags (--use-chat-template, --trust-remote-code) and ignores
configuration attributes self.backend and self.streaming; update
_to_sa_benchmark_cmd() to mirror the default benchmark’s behavior by
conditionally adding --use-chat-template and --trust-remote-code only when the
corresponding object attributes/settings require them, and include logic to
incorporate self.backend and self.streaming into the generated command (e.g.,
add the appropriate --backend value and enable/disable streaming flags) so SA
runs respect the same config keys as the default benchmark (ensure you reference
_to_sa_benchmark_cmd(), self.backend, self.streaming, --use-chat-template, and
--trust-remote-code when making changes).
- Around line 64-75: Replace the unsafe clone in ensure_bench_serving_repo by
pinning BENCH_SERVING_REPO to a specific commit SHA (add BENCH_SERVING_COMMIT)
and perform a shallow clone followed by an explicit git checkout to that SHA;
use tempfile.mkdtemp() instead of the hardcoded BENCH_SERVING_DIR to create an
isolated dir and avoid symlink/TOCTOU issues, and update logic around
BENCH_SERVING_DIR and bench_script to use that temp dir (refer to symbols
BENCH_SERVING_REPO, BENCH_SERVING_COMMIT, BENCH_SERVING_DIR,
ensure_bench_serving_repo, and bench_script). Ensure the subprocess calls use
["git", "clone", "--depth", "1", "--single-branch", BENCH_SERVING_REPO,
temp_dir] then ["git", "fetch", "--depth=1", "origin", BENCH_SERVING_COMMIT] or
["git", "checkout", BENCH_SERVING_COMMIT] as needed, and remove any
unconditional trust of branch names.

---

Nitpick comments:
In `@tests/integration/defs/perf/test_perf_sanity.py`:
- Around line 1538-1548: The current loop in which parse_metrics_from_output is
used should not inject a misleading 0.0 for missing user_throughput; instead,
set metrics["user_throughput"] = None and add an explicit indicator like
metrics["user_throughput_measured"] = False (when
client_configs[client_idx].use_nv_sa_benchmark is true and user_throughput is
absent) so consumers can distinguish "not measured" from a real zero, and update
any downstream validation/aggregation that reads _perf_results to treat None or
the measured flag appropriately; reference parse_metrics_from_output,
_perf_results, client_configs and use_nv_sa_benchmark when making the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: e2f94a79-22d1-4770-91cc-780c08769306

📥 Commits

Reviewing files that changed from the base of the PR and between 11716e0 and e051f24.

📒 Files selected for processing (2)
  • tests/integration/defs/perf/open_search_db_utils.py
  • tests/integration/defs/perf/test_perf_sanity.py

@chenfeiz0326 chenfeiz0326 force-pushed the chenfeiz/add-nv-sa-benchmark-in-submitpy branch from 928d56d to 5d871d1 Compare April 14, 2026 03:27
@chenfeiz0326
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast --stage-list "GB200-4_GPUs-PyTorch-PerfSanity-1,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-1,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-2,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-3,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-4,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-5,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-6,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-7,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-2,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-3,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-4,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-5,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-6,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-7,GB200-8_GPUs-2_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU1-GEN1-NODE1-GPU2-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE1-GPU4-Post-Merge-1,GB200-12_GPUs-3_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU1-GEN1-NODE2-GPU8-Post-Merge-1,GB200-16_GPUs-4_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE2-GPU8-GEN1-NODE2-GPU8-Post-Merge-1,GB200-24_GPUs-6_Nodes-PyTorch-Disagg-PerfSanity-CTX2-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-1"

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43173 [ run ] triggered by Bot. Commit: 5d871d1 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43173 [ run ] completed with state SUCCESS. Commit: 5d871d1
/LLM/main/L0_MergeRequest_PR pipeline #33801 (Partly Tested) completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
@chenfeiz0326 chenfeiz0326 force-pushed the chenfeiz/add-nv-sa-benchmark-in-submitpy branch from 5d871d1 to 09f6d2e Compare April 14, 2026 14:17
@chenfeiz0326
Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast --stage-list "GB200-4_GPUs-PyTorch-PerfSanity-1,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-1,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-2,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-3,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-4,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-5,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-6,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-7,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-2,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-3,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-4,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-5,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-6,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-7,GB200-8_GPUs-2_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU1-GEN1-NODE1-GPU2-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE1-GPU4-Post-Merge-1,GB200-12_GPUs-3_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU1-GEN1-NODE2-GPU8-Post-Merge-1,GB200-16_GPUs-4_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE2-GPU8-GEN1-NODE2-GPU8-Post-Merge-1,GB200-24_GPUs-6_Nodes-PyTorch-Disagg-PerfSanity-CTX2-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-1"

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43245 [ run ] triggered by Bot. Commit: 09f6d2e Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43245 [ run ] completed with state DISABLED
Freeze main and open the PR merge only after CI is back to healthy https://nvidia.slack.com/archives/C059LSY62BT/p1776141760843319?thread_ts=1775985925.442509&cid=C059LSY62BT

Link to invocation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants