[None][test] Add sync_qa_tests Jenkins script and sync-qa-tests skill by xinhe-nv · Pull Request #13028 · NVIDIA/TensorRT-LLM

xinhe-nv · 2026-04-14T08:01:12Z

Add jenkins/scripts/sync_qa_tests.py (sync-core + regenerate-sanity)
Add .claude/skills/sync-qa-tests/SKILL.md
Update QA integration lists and coderabbit config

Made-with: Cursor

Summary by CodeRabbit

New Features
- Introduced automated QA test list synchronization and validation tooling
Documentation
- Added comprehensive process documentation for QA test list management and synchronization workflows
Chores
- Enhanced code review configuration with test-specific analysis guidelines
- Updated P0 functional sanity test coverage and multinode GPU performance test configurations

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

xinhe-nv · 2026-04-14T08:54:47Z

@coderabbitai review the code

coderabbitai · 2026-04-14T08:55:01Z

@xinhe-nv: Sure! I'll review the code changes in this PR right away.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai · 2026-04-14T09:03:48Z

📝 Walkthrough

Walkthrough

This PR introduces QA test list management infrastructure for TensorRT-LLM. It adds documentation describing a synchronization process, a Python script that extracts and filters pytest node IDs from YAML sources by backend and path criteria, and updates curated test lists to reflect new model variants and selection strategies.

Changes

Cohort / File(s)	Summary
QA Test Management Documentation `.claude/skills/sync-qa-tests/SKILL.md`, `.coderabbit.yaml`	Introduces skill documentation for the QA test synchronization end-to-end process and updates CodeRabbit review configuration to provide QA-focused guidance for test-related changes.
QA Test List Sync Script `jenkins/scripts/sync_qa_tests.py`	New Python utility implementing `sync-core` and `regenerate-sanity` subcommands to maintain pytest node-id lists by extracting from YAML sources, filtering by backend (pytorch/autodeploy) and path prefixes (accuracy/disaggregated), and applying P0 model-name selection and method-grouping caps; supports `--dry-run` and `--repo-root` flags.
QA Test List Content Updates `tests/integration/test_lists/qa/llm_function_core_sanity.txt`, `tests/integration/test_lists/qa/llm_perf_multinode.yml`, `tests/integration/test_lists/test-db/l0_a10.yml`	Refreshes test selection to include new model variants (DeepSeek V3/R1, Qwen3, NemotronV3, KimiK25) with various quantization and scheduling modes; updates GB200 perf sanity union; removes obsolete Mistral entries.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 58.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	⚠️ Warning	PR description is minimal; title, Description, and Test Coverage sections are empty. Only a bullet list of changes and 'Made-with' note are provided.	Add a proper PR title following the template format. Fill in the Description section explaining the issue/motivation and solution. Document test coverage for the new jenkins script and sanity list updates.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main additions: a new Jenkins script (sync_qa_tests.py) and a skill documentation file, which represent the primary changes in this changeset.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (7)

jenkins/scripts/sync_qa_tests.py (5)
2-2: Copyright year inconsistency.

Other files in this PR use 2025-2026 for the copyright year range (e.g., .coderabbit.yaml), but this file only shows 2026. Consider aligning for consistency.
📝 Suggested fix
-# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@jenkins/scripts/sync_qa_tests.py` at line 2, Update the SPDX header in
sync_qa_tests.py so the copyright year range matches the rest of the PR (use
"2025-2026" instead of "2026"); locate the top-of-file SPDX comment line (the
string starting with "# SPDX-FileCopyrightText: Copyright (c) ...") and change
the year portion to the range "2025-2026" to align with files like
.coderabbit.yaml.
15-15: Unnecessary from __future__ import annotations.

Per the project's Python 3.10+ requirement, the from __future__ import annotations import is not needed—PEP 604 union syntax (str | None) and other modern type hints work natively.

Removing it is optional but reduces noise. Based on learnings: "In TensorRT-LLM (Python requires >=3.10), you can use Python 3.10+ features throughout the codebase, and you do not need to add from __future__ import annotations."
📝 Suggested fix
-from __future__ import annotations
-
 import argparse
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@jenkins/scripts/sync_qa_tests.py` at line 15, Remove the unnecessary future
import at the top of sync_qa_tests.py: delete the line "from __future__ import
annotations" since the project targets Python 3.10+ and PEP 604/modern type
hints are supported natively; ensure no other code changes are needed because
annotations can remain as-is.
146-149: Comparison may produce unexpected results when list was already sorted.

The condition all_lines == existing_lines compares a newly sorted list against existing_lines, which preserves the original file order. If the file was already sorted and had no missing entries, this comparison would correctly detect no changes. However, if the file was unsorted but complete, this would incorrectly trigger a rewrite just to sort.

This behavior seems intentional (ensuring the file is always sorted), but the "Nothing to change" message could be misleading since it only prints when both conditions are met.

Consider clarifying the intent or adjusting the message:
📝 Suggested clarification
     all_lines = sorted(set(existing_lines) | set(missing))
     if not missing and all_lines == existing_lines:
-        print("Nothing to change.")
+        print("Already up to date (no missing entries, already sorted).")
         return 0
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@jenkins/scripts/sync_qa_tests.py` around lines 146 - 149, The current check
uses all_lines = sorted(set(existing_lines) | set(missing)) and prints "Nothing
to change." only when not missing and all_lines == existing_lines, which
conflates “no missing entries” with “already sorted”; update logic or messaging
in sync_qa_tests.py so intent is clear: either (A) treat sorting as a change by
comparing sets only (e.g., compare set(existing_lines) vs set(all_lines)) and
keep the message as-is, or (B) preserve existing behavior but change the message
to indicate both “no missing entries and file already sorted” (referencing
variables all_lines, existing_lines, missing and the surrounding check) so the
output isn’t misleading. Ensure whichever option you pick updates the
conditional and the printed string accordingly.
305-305: Consider using parser.error() instead of AssertionError.

Since required=True is set on the subparsers, this line should never be reached. However, if it somehow is, parser.error() would provide a more user-friendly error message than AssertionError.

That said, this is defensive code for an unreachable path, so keeping AssertionError is acceptable.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@jenkins/scripts/sync_qa_tests.py` at line 305, Replace the defensive raise
AssertionError(f"unknown command: {args.command}") with a user-friendly
parser.error call: locate the unreachable fallback where args.command is handled
and replace the AssertionError with parser.error(f"unknown command:
{args.command}") so argparse prints a proper usage message; ensure you use the
existing parser variable and preserve the same error text for consistency.
62-96: YAML parsing uses regex—consider robustness for edge cases.

The extract_accuracy_disaggregated_from_yml function parses YAML using regex line-by-line rather than a proper YAML parser. This works for the current test-db format but could break if:

YAML uses multi-line strings or flow syntax

Indentation varies unexpectedly

Comments appear in unusual positions

For the current structured test-db files, this approach is pragmatic and avoids adding a pyyaml dependency. If test-db format evolves, consider revisiting.
.claude/skills/sync-qa-tests/SKILL.md (2)
61-61: Minor: Awkward phrasing with "only" repetition.

The phrase "today the file is gen_only disagg only" reads awkwardly due to "only" appearing twice.

Consider rephrasing to: "today the file covers gen_only disagg entries exclusively"
📝 Suggested fix
-   Other prefixes inside the bracket (`disagg-e2e-`, `aggr-...`, `aggr-ctx_only-`, etc.) exist in some DB files; **do not** auto-merge non–gen_only disagg lines into `llm_perf_multinode.yml` unless perf/QA owners explicitly want them in this union (today the file is **gen_only** disagg only).
+   Other prefixes inside the bracket (`disagg-e2e-`, `aggr-...`, `aggr-ctx_only-`, etc.) exist in some DB files; **do not** auto-merge non–gen_only disagg lines into `llm_perf_multinode.yml` unless perf/QA owners explicitly want them in this union (today the file covers **gen_only** disagg entries exclusively).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/skills/sync-qa-tests/SKILL.md at line 61, Replace the awkward phrase
"today the file is **gen_only** disagg only" in SKILL.md with clearer wording
such as "today the file covers **gen_only** disagg entries exclusively" to avoid
duplicate "only" and improve readability; locate the exact sentence containing
"Other prefixes... do not auto-merge non–gen_only disagg lines into
`llm_perf_multinode.yml` unless perf/QA owners explicitly want them in this
union (today the file is **gen_only** disagg only)" and update it to the
suggested rephrasing.
70-70: Minor: Remove leading space inside code span.

The inline code ` TIMEOUT (NNN)` has a leading space which triggers markdownlint MD038.
📝 Suggested fix
-   Preserve the same **` TIMEOUT (NNN)`** suffix as in test-db when present (commonly `120`). Keep list style consistent with existing entries (`  - perf/...`).
+   Preserve the same **`TIMEOUT (NNN)`** suffix as in test-db when present (commonly `120`). Keep list style consistent with existing entries (`  - perf/...`).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/skills/sync-qa-tests/SKILL.md at line 70, The inline code span
contains a leading space ("` TIMEOUT (NNN)`") which triggers markdownlint MD038;
update the code span to remove the leading space so it reads "`TIMEOUT (NNN)`"
and ensure the surrounding list entry style remains consistent with existing
entries (e.g., "  - perf/...")—locate the offending text by searching for the
code span ` TIMEOUT (NNN)` in SKILL.md and replace it accordingly.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/integration/test_lists/qa/llm_perf_multinode.yml`:
- Around line 17-18: Update the QA node IDs to match the actual parametrization
naming: replace occurrences of "disagg-gen_only-gb200_<model>" with
"disagg_upload-gen_only-gb200_<model>" (e.g., for models deepseek-r1,
deepseek-v32, kimi-k2-thinking, gpt-oss-120b, qwen3-235b) so the QA entries
match the test parametrization keys used by the
l0_gb200_multi_nodes_perf_sanity* files; verify each listed node ID string now
uses "disagg_upload-gen_only-gb200_" exactly.

---

Nitpick comments:
In @.claude/skills/sync-qa-tests/SKILL.md:
- Line 61: Replace the awkward phrase "today the file is **gen_only** disagg
only" in SKILL.md with clearer wording such as "today the file covers
**gen_only** disagg entries exclusively" to avoid duplicate "only" and improve
readability; locate the exact sentence containing "Other prefixes... do not
auto-merge non–gen_only disagg lines into `llm_perf_multinode.yml` unless
perf/QA owners explicitly want them in this union (today the file is
**gen_only** disagg only)" and update it to the suggested rephrasing.
- Line 70: The inline code span contains a leading space ("` TIMEOUT (NNN)`")
which triggers markdownlint MD038; update the code span to remove the leading
space so it reads "`TIMEOUT (NNN)`" and ensure the surrounding list entry style
remains consistent with existing entries (e.g., "  - perf/...")—locate the
offending text by searching for the code span ` TIMEOUT (NNN)` in SKILL.md and
replace it accordingly.

In `@jenkins/scripts/sync_qa_tests.py`:
- Line 2: Update the SPDX header in sync_qa_tests.py so the copyright year range
matches the rest of the PR (use "2025-2026" instead of "2026"); locate the
top-of-file SPDX comment line (the string starting with "#
SPDX-FileCopyrightText: Copyright (c) ...") and change the year portion to the
range "2025-2026" to align with files like .coderabbit.yaml.
- Line 15: Remove the unnecessary future import at the top of sync_qa_tests.py:
delete the line "from __future__ import annotations" since the project targets
Python 3.10+ and PEP 604/modern type hints are supported natively; ensure no
other code changes are needed because annotations can remain as-is.
- Around line 146-149: The current check uses all_lines =
sorted(set(existing_lines) | set(missing)) and prints "Nothing to change." only
when not missing and all_lines == existing_lines, which conflates “no missing
entries” with “already sorted”; update logic or messaging in sync_qa_tests.py so
intent is clear: either (A) treat sorting as a change by comparing sets only
(e.g., compare set(existing_lines) vs set(all_lines)) and keep the message
as-is, or (B) preserve existing behavior but change the message to indicate both
“no missing entries and file already sorted” (referencing variables all_lines,
existing_lines, missing and the surrounding check) so the output isn’t
misleading. Ensure whichever option you pick updates the conditional and the
printed string accordingly.
- Line 305: Replace the defensive raise AssertionError(f"unknown command:
{args.command}") with a user-friendly parser.error call: locate the unreachable
fallback where args.command is handled and replace the AssertionError with
parser.error(f"unknown command: {args.command}") so argparse prints a proper
usage message; ensure you use the existing parser variable and preserve the same
error text for consistency.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 173b8db8-5944-4c20-b8d3-c6b54eb65e2c

📥 Commits

Reviewing files that changed from the base of the PR and between f2f9051 and 4d142b1.

📒 Files selected for processing (7)

.claude/skills/sync-qa-tests/SKILL.md
.coderabbit.yaml
jenkins/scripts/sync_qa_tests.py
tests/integration/test_lists/qa/llm_function_core.txt
tests/integration/test_lists/qa/llm_function_core_sanity.txt
tests/integration/test_lists/qa/llm_perf_multinode.yml
tests/integration/test_lists/test-db/l0_a10.yml

💤 Files with no reviewable changes (1)

tests/integration/test_lists/test-db/l0_a10.yml

xinhe-nv · 2026-04-15T06:04:05Z

/bot run --stage-list ""

tensorrt-cicd · 2026-04-15T06:10:27Z

PR_Github #43408 [ run ] triggered by Bot. Commit: 0271271 Link to invocation

tensorrt-cicd · 2026-04-15T07:33:05Z

PR_Github #43408 [ run ] completed with state FAILURE. Commit: 0271271
/LLM/main/L0_MergeRequest_PR pipeline #33942 (Partly Tested) completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

xinhe-nv · 2026-04-15T09:42:31Z

/bot run --stage-list ""

tensorrt-cicd · 2026-04-15T09:48:44Z

PR_Github #43470 [ run ] triggered by Bot. Commit: 150ce21 Link to invocation

tensorrt-cicd · 2026-04-15T11:29:41Z

PR_Github #43470 [ run ] completed with state FAILURE. Commit: 150ce21
/LLM/main/L0_MergeRequest_PR pipeline #33989 (Partly Tested) completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>

xinhe-nv · 2026-04-16T02:03:38Z

/bot run --stage-list ""

tensorrt-cicd · 2026-04-16T02:09:33Z

PR_Github #43607 [ run ] triggered by Bot. Commit: 6a28f49 Link to invocation

Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>

tensorrt-cicd · 2026-04-16T03:04:48Z

PR_Github #43607 [ run ] completed with state SUCCESS. Commit: 6a28f49
/LLM/main/L0_MergeRequest_PR pipeline #34100 (Partly Tested) completed with status: 'SUCCESS'

CI Report

Link to invocation

Add test_openai_consistent_chat and test_ptp_quickstart_advanced_multi_gpus which are referenced in waives.txt but were missing from the QA list. Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>

github-actions bot assigned xinhe-nv Apr 14, 2026

xinhe-nv force-pushed the rcca branch 2 times, most recently from 6c51318 to 4d142b1 Compare April 14, 2026 08:51

xinhe-nv marked this pull request as ready for review April 14, 2026 08:53

xinhe-nv requested review from a team as code owners April 14, 2026 08:53

xinhe-nv requested review from dpitman-nvda and venkywonka April 14, 2026 08:53

coderabbitai bot reviewed Apr 14, 2026

View reviewed changes

Comment thread tests/integration/test_lists/qa/llm_perf_multinode.yml Outdated

xinhe-nv changed the title ~~[None][chore] Add sync_qa_tests Jenkins script and sync-qa-tests skill~~ [None][test] Add sync_qa_tests Jenkins script and sync-qa-tests skill Apr 14, 2026

ZhanruiSunCh approved these changes Apr 14, 2026

View reviewed changes

jieli-matrix approved these changes Apr 15, 2026

View reviewed changes

xinhe-nv force-pushed the rcca branch from 42ce4cd to 0271271 Compare April 15, 2026 06:03

xinhe-nv force-pushed the rcca branch from 0271271 to 49d10e7 Compare April 15, 2026 09:42

xinhe-nv added 2 commits April 16, 2026 10:03

add sync tests skill and QA reviewing in coderabbitai

b93265a

Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>

update sanity tests

5b8687f

Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>

xinhe-nv force-pushed the rcca branch from 150ce21 to 6a28f49 Compare April 16, 2026 02:03

fix pipeline issues

287aea2

Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>

xinhe-nv force-pushed the rcca branch from 6a28f49 to 287aea2 Compare April 16, 2026 02:59

EmmaQiaoCh approved these changes Apr 16, 2026

View reviewed changes

[None][test] Add missing waived tests to llm_function_core.txt

bcc3665

Add test_openai_consistent_chat and test_ptp_quickstart_advanced_multi_gpus which are referenced in waives.txt but were missing from the QA list. Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>

Conversation

xinhe-nv commented Apr 14, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Description

Test Coverage

PR Checklist

GitHub Bot Help

Uh oh!

xinhe-nv commented Apr 14, 2026

Uh oh!

coderabbitai bot commented Apr 14, 2026

Uh oh!

coderabbitai bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

xinhe-nv commented Apr 15, 2026

Uh oh!

tensorrt-cicd commented Apr 15, 2026

Uh oh!

tensorrt-cicd commented Apr 15, 2026

Uh oh!

xinhe-nv commented Apr 15, 2026

Uh oh!

tensorrt-cicd commented Apr 15, 2026

Uh oh!

tensorrt-cicd commented Apr 15, 2026

Uh oh!

xinhe-nv commented Apr 16, 2026

Uh oh!

tensorrt-cicd commented Apr 16, 2026

Uh oh!

tensorrt-cicd commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

xinhe-nv commented Apr 14, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 14, 2026 •

edited

Loading