[None][infra] Support nv sa benchmark in CI Perf Test by chenfeiz0326 · Pull Request #13004 · NVIDIA/TensorRT-LLM

chenfeiz0326 · 2026-04-13T14:48:14Z

Summary by CodeRabbit

New Features
- Added configuration option to support alternative performance benchmarking tool.
Bug Fixes
- Improved backward compatibility for historical data matching when records lack recently-added boolean fields.

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

coderabbitai · 2026-04-13T14:59:08Z

📝 Walkthrough

Walkthrough

Two test utility files are updated: OpenSearch match logic now treats missing boolean fields as False when comparing history and new data; simultaneously, support for an external SA benchmark is added via repository cloning and ClientConfig extension with conditional benchmark command generation.

Changes

Cohort / File(s)	Summary
OpenSearch Match Logic `tests/integration/defs/perf/open_search_db_utils.py`	Modified `_match()` to treat missing/`None` boolean fields (keys prefixed with `b_`) as `False` before comparison. Updated `get_history_data()` OpenSearch query construction to match documents where boolean fields are explicitly `False` or missing using `bool` queries with `must_not exists` clauses.
SA Benchmark Support `tests/integration/defs/perf/test_perf_sanity.py`	Added `ensure_bench_serving_repo()` function and module constants to clone external `bench_serving` repository. Extended `ClientConfig` with `use_nv_sa_benchmark` boolean field. Modified `to_cmd()` to conditionally generate SA benchmark or default tensorrt_llm commands. Updated disaggregated config parsing to propagate benchmark settings. Enhanced performance metric parsing to inject zero `user_throughput` for SA benchmark clients when absent.

Sequence Diagram(s)

sequenceDiagram
    participant Test as Test Execution
    participant Config as ClientConfig
    participant Repo as ensure_bench_serving_repo()
    participant Benchmark as benchmark_serving.py
    participant Parser as get_perf_result()
    
    Test->>Config: Initialize with use_nv_sa_benchmark=true
    Config->>Config: to_cmd() called
    alt use_nv_sa_benchmark is true
        Config->>Repo: Request benchmark repo path
        Repo->>Repo: Clone bench_serving if needed
        Repo-->>Config: Return benchmark_serving.py path
        Config->>Benchmark: Generate SA benchmark command
        Benchmark-->>Config: Command with SA arguments
    else use_nv_sa_benchmark is false
        Config->>Config: _to_default_benchmark_cmd()
    end
    Config-->>Test: Return benchmark command
    Test->>Benchmark: Execute benchmark
    Benchmark-->>Parser: Produce metrics (may lack user_throughput)
    Parser->>Parser: Enumerate outputs by client_idx
    alt user_throughput missing in SA benchmark
        Parser->>Parser: Inject user_throughput = 0.0
    end
    Parser-->>Test: Return complete metrics

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description is incomplete. It contains only the template structure with no substantive content in any required sections: Description, Test Coverage, and PR Checklist items are empty.	Please fill in the Description section explaining what changes were made and why. Add Test Coverage section listing relevant tests. Complete the PR Checklist by checking appropriate items and providing additional context as needed.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly describes the main change: adding support for the NVIDIA SA benchmark in CI performance tests, which aligns with the changes adding use_nv_sa_benchmark flag and benchmark_serving.py integration.
Docstring Coverage	✅ Passed	Docstring coverage is 83.33% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

tests/integration/defs/perf/test_perf_sanity.py (1)

1538-1548: Consider: Using 0.0 as placeholder for missing user_throughput may be misleading.

When SA benchmark doesn't output user_throughput, injecting 0.0 allows the test to pass but may be misleading in the database. A value of 0.0 could be confused with actual zero throughput.

Consider either:

Using None and adjusting downstream validation to handle it, or
Adding a comment/field indicating the metric was not measured

💡 Alternative: Track that metric was unavailable

                 # SA benchmark (bench_serving) doesn't output user_throughput
                 if (
                     metrics
                     and "user_throughput" not in metrics
                     and client_idx < len(client_configs)
                     and client_configs[client_idx].use_nv_sa_benchmark
                 ):
-                    metrics["user_throughput"] = 0.0
+                    # Placeholder: SA benchmark doesn't report this metric
+                    metrics["user_throughput"] = -1.0  # or use None if downstream supports it

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/integration/defs/perf/test_perf_sanity.py` around lines 1538 - 1548,
The current loop in which parse_metrics_from_output is used should not inject a
misleading 0.0 for missing user_throughput; instead, set
metrics["user_throughput"] = None and add an explicit indicator like
metrics["user_throughput_measured"] = False (when
client_configs[client_idx].use_nv_sa_benchmark is true and user_throughput is
absent) so consumers can distinguish "not measured" from a real zero, and update
any downstream validation/aggregation that reads _perf_results to treat None or
the measured flag appropriately; reference parse_metrics_from_output,
_perf_results, client_configs and use_nv_sa_benchmark when making the change.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/integration/defs/perf/test_perf_sanity.py`:
- Around line 475-502: The SA benchmark builder _to_sa_benchmark_cmd() currently
hardcodes flags (--use-chat-template, --trust-remote-code) and ignores
configuration attributes self.backend and self.streaming; update
_to_sa_benchmark_cmd() to mirror the default benchmark’s behavior by
conditionally adding --use-chat-template and --trust-remote-code only when the
corresponding object attributes/settings require them, and include logic to
incorporate self.backend and self.streaming into the generated command (e.g.,
add the appropriate --backend value and enable/disable streaming flags) so SA
runs respect the same config keys as the default benchmark (ensure you reference
_to_sa_benchmark_cmd(), self.backend, self.streaming, --use-chat-template, and
--trust-remote-code when making changes).
- Around line 64-75: Replace the unsafe clone in ensure_bench_serving_repo by
pinning BENCH_SERVING_REPO to a specific commit SHA (add BENCH_SERVING_COMMIT)
and perform a shallow clone followed by an explicit git checkout to that SHA;
use tempfile.mkdtemp() instead of the hardcoded BENCH_SERVING_DIR to create an
isolated dir and avoid symlink/TOCTOU issues, and update logic around
BENCH_SERVING_DIR and bench_script to use that temp dir (refer to symbols
BENCH_SERVING_REPO, BENCH_SERVING_COMMIT, BENCH_SERVING_DIR,
ensure_bench_serving_repo, and bench_script). Ensure the subprocess calls use
["git", "clone", "--depth", "1", "--single-branch", BENCH_SERVING_REPO,
temp_dir] then ["git", "fetch", "--depth=1", "origin", BENCH_SERVING_COMMIT] or
["git", "checkout", BENCH_SERVING_COMMIT] as needed, and remove any
unconditional trust of branch names.

---

Nitpick comments:
In `@tests/integration/defs/perf/test_perf_sanity.py`:
- Around line 1538-1548: The current loop in which parse_metrics_from_output is
used should not inject a misleading 0.0 for missing user_throughput; instead,
set metrics["user_throughput"] = None and add an explicit indicator like
metrics["user_throughput_measured"] = False (when
client_configs[client_idx].use_nv_sa_benchmark is true and user_throughput is
absent) so consumers can distinguish "not measured" from a real zero, and update
any downstream validation/aggregation that reads _perf_results to treat None or
the measured flag appropriately; reference parse_metrics_from_output,
_perf_results, client_configs and use_nv_sa_benchmark when making the change.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: e2f94a79-22d1-4770-91cc-780c08769306

📥 Commits

Reviewing files that changed from the base of the PR and between 11716e0 and e051f24.

📒 Files selected for processing (2)

tests/integration/defs/perf/open_search_db_utils.py
tests/integration/defs/perf/test_perf_sanity.py

tests/integration/defs/perf/test_perf_sanity.py

chenfeiz0326 · 2026-04-14T04:27:11Z

/bot run --disable-fail-fast --stage-list "GB200-4_GPUs-PyTorch-PerfSanity-1,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-1,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-2,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-3,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-4,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-5,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-6,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-7,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-2,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-3,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-4,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-5,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-6,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-7,GB200-8_GPUs-2_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU1-GEN1-NODE1-GPU2-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE1-GPU4-Post-Merge-1,GB200-12_GPUs-3_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU1-GEN1-NODE2-GPU8-Post-Merge-1,GB200-16_GPUs-4_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE2-GPU8-GEN1-NODE2-GPU8-Post-Merge-1,GB200-24_GPUs-6_Nodes-PyTorch-Disagg-PerfSanity-CTX2-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-1"

tensorrt-cicd · 2026-04-14T04:32:31Z

PR_Github #43173 [ run ] triggered by Bot. Commit: 5d871d1 Link to invocation

tensorrt-cicd · 2026-04-14T11:44:30Z

PR_Github #43173 [ run ] completed with state SUCCESS. Commit: 5d871d1
/LLM/main/L0_MergeRequest_PR pipeline #33801 (Partly Tested) completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>

chenfeiz0326 · 2026-04-14T14:17:17Z

/bot run --disable-fail-fast --stage-list "GB200-4_GPUs-PyTorch-PerfSanity-1,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-1,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-2,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-3,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-4,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-5,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-6,GB200-4_GPUs-PyTorch-PerfSanity-Post-Merge-7,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-2,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-3,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-4,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-5,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-6,GB200-8_GPUs-2_Nodes-PyTorch-PerfSanity-Node2-GPU8-Post-Merge-7,GB200-8_GPUs-2_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU1-GEN1-NODE1-GPU2-Post-Merge-1,GB200-8_GPUs-2_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU4-GEN1-NODE1-GPU4-Post-Merge-1,GB200-12_GPUs-3_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE1-GPU1-GEN1-NODE2-GPU8-Post-Merge-1,GB200-16_GPUs-4_Nodes-PyTorch-Disagg-PerfSanity-CTX1-NODE2-GPU8-GEN1-NODE2-GPU8-Post-Merge-1,GB200-24_GPUs-6_Nodes-PyTorch-Disagg-PerfSanity-CTX2-NODE1-GPU4-GEN1-NODE4-GPU16-Post-Merge-1"

tensorrt-cicd · 2026-04-14T14:29:05Z

PR_Github #43245 [ run ] triggered by Bot. Commit: 09f6d2e Link to invocation

tensorrt-cicd · 2026-04-14T14:29:06Z

PR_Github #43245 [ run ] completed with state DISABLED
Freeze main and open the PR merge only after CI is back to healthy https://nvidia.slack.com/archives/C059LSY62BT/p1776141760843319?thread_ts=1775985925.442509&cid=C059LSY62BT

Link to invocation

chenfeiz0326 requested a review from a team as a code owner April 13, 2026 14:48

github-actions bot assigned chenfeiz0326 Apr 13, 2026

coderabbitai bot reviewed Apr 13, 2026

View reviewed changes

tests/integration/defs/perf/test_perf_sanity.py Show resolved Hide resolved

tests/integration/defs/perf/test_perf_sanity.py Show resolved Hide resolved

chenfeiz0326 force-pushed the chenfeiz/add-nv-sa-benchmark-in-submitpy branch from 928d56d to 5d871d1 Compare April 14, 2026 03:27

yingguo-trt approved these changes Apr 14, 2026

View reviewed changes

chenfeiz0326 requested review from fredricz-20070104 and ruodil April 14, 2026 06:10

StanleySun639 approved these changes Apr 14, 2026

View reviewed changes

update

09f6d2e

Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>

chenfeiz0326 force-pushed the chenfeiz/add-nv-sa-benchmark-in-submitpy branch from 5d871d1 to 09f6d2e Compare April 14, 2026 14:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[None][infra] Support nv sa benchmark in CI Perf Test#13004

[None][infra] Support nv sa benchmark in CI Perf Test#13004
chenfeiz0326 wants to merge 1 commit intoNVIDIA:mainfrom
chenfeiz0326:chenfeiz/add-nv-sa-benchmark-in-submitpy

chenfeiz0326 commented Apr 13, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Apr 13, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

chenfeiz0326 commented Apr 14, 2026

Uh oh!

tensorrt-cicd commented Apr 14, 2026

Uh oh!

tensorrt-cicd commented Apr 14, 2026

Uh oh!

chenfeiz0326 commented Apr 14, 2026

Uh oh!

tensorrt-cicd commented Apr 14, 2026

Uh oh!

tensorrt-cicd commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

chenfeiz0326 commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

coderabbitai bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

chenfeiz0326 commented Apr 14, 2026

Uh oh!

tensorrt-cicd commented Apr 14, 2026

Uh oh!

tensorrt-cicd commented Apr 14, 2026

Uh oh!

chenfeiz0326 commented Apr 14, 2026

Uh oh!

tensorrt-cicd commented Apr 14, 2026

Uh oh!

tensorrt-cicd commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

chenfeiz0326 commented Apr 13, 2026 •

edited

Loading

coderabbitai bot commented Apr 13, 2026 •

edited

Loading