Skip to content

[None][test] Add sync_qa_tests Jenkins script and sync-qa-tests skill#13028

Open
xinhe-nv wants to merge 4 commits intoNVIDIA:mainfrom
xinhe-nv:rcca
Open

[None][test] Add sync_qa_tests Jenkins script and sync-qa-tests skill#13028
xinhe-nv wants to merge 4 commits intoNVIDIA:mainfrom
xinhe-nv:rcca

Conversation

@xinhe-nv
Copy link
Copy Markdown
Collaborator

@xinhe-nv xinhe-nv commented Apr 14, 2026

  • Add jenkins/scripts/sync_qa_tests.py (sync-core + regenerate-sanity)
  • Add .claude/skills/sync-qa-tests/SKILL.md
  • Update QA integration lists and coderabbit config

Made-with: Cursor

Summary by CodeRabbit

  • New Features

    • Introduced automated QA test list synchronization and validation tooling
  • Documentation

    • Added comprehensive process documentation for QA test list management and synchronization workflows
  • Chores

    • Enhanced code review configuration with test-specific analysis guidelines
    • Updated P0 functional sanity test coverage and multinode GPU performance test configurations

Description

Test Coverage

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

@xinhe-nv xinhe-nv force-pushed the rcca branch 2 times, most recently from 6c51318 to 4d142b1 Compare April 14, 2026 08:51
@xinhe-nv xinhe-nv marked this pull request as ready for review April 14, 2026 08:53
@xinhe-nv xinhe-nv requested review from a team as code owners April 14, 2026 08:53
@xinhe-nv
Copy link
Copy Markdown
Collaborator Author

@coderabbitai review the code

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 14, 2026

@xinhe-nv: Sure! I'll review the code changes in this PR right away.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 14, 2026

📝 Walkthrough

Walkthrough

This PR introduces QA test list management infrastructure for TensorRT-LLM. It adds documentation describing a synchronization process, a Python script that extracts and filters pytest node IDs from YAML sources by backend and path criteria, and updates curated test lists to reflect new model variants and selection strategies.

Changes

Cohort / File(s) Summary
QA Test Management Documentation
.claude/skills/sync-qa-tests/SKILL.md, .coderabbit.yaml
Introduces skill documentation for the QA test synchronization end-to-end process and updates CodeRabbit review configuration to provide QA-focused guidance for test-related changes.
QA Test List Sync Script
jenkins/scripts/sync_qa_tests.py
New Python utility implementing sync-core and regenerate-sanity subcommands to maintain pytest node-id lists by extracting from YAML sources, filtering by backend (pytorch/autodeploy) and path prefixes (accuracy/disaggregated), and applying P0 model-name selection and method-grouping caps; supports --dry-run and --repo-root flags.
QA Test List Content Updates
tests/integration/test_lists/qa/llm_function_core_sanity.txt, tests/integration/test_lists/qa/llm_perf_multinode.yml, tests/integration/test_lists/test-db/l0_a10.yml
Refreshes test selection to include new model variants (DeepSeek V3/R1, Qwen3, NemotronV3, KimiK25) with various quantization and scheduling modes; updates GB200 perf sanity union; removes obsolete Mistral entries.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 58.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ⚠️ Warning PR description is minimal; title, Description, and Test Coverage sections are empty. Only a bullet list of changes and 'Made-with' note are provided. Add a proper PR title following the template format. Fill in the Description section explaining the issue/motivation and solution. Document test coverage for the new jenkins script and sanity list updates.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main additions: a new Jenkins script (sync_qa_tests.py) and a skill documentation file, which represent the primary changes in this changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (7)
jenkins/scripts/sync_qa_tests.py (5)

2-2: Copyright year inconsistency.

Other files in this PR use 2025-2026 for the copyright year range (e.g., .coderabbit.yaml), but this file only shows 2026. Consider aligning for consistency.

📝 Suggested fix
-# SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@jenkins/scripts/sync_qa_tests.py` at line 2, Update the SPDX header in
sync_qa_tests.py so the copyright year range matches the rest of the PR (use
"2025-2026" instead of "2026"); locate the top-of-file SPDX comment line (the
string starting with "# SPDX-FileCopyrightText: Copyright (c) ...") and change
the year portion to the range "2025-2026" to align with files like
.coderabbit.yaml.

15-15: Unnecessary from __future__ import annotations.

Per the project's Python 3.10+ requirement, the from __future__ import annotations import is not needed—PEP 604 union syntax (str | None) and other modern type hints work natively.

Removing it is optional but reduces noise. Based on learnings: "In TensorRT-LLM (Python requires >=3.10), you can use Python 3.10+ features throughout the codebase, and you do not need to add from __future__ import annotations."

📝 Suggested fix
-from __future__ import annotations
-
 import argparse
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@jenkins/scripts/sync_qa_tests.py` at line 15, Remove the unnecessary future
import at the top of sync_qa_tests.py: delete the line "from __future__ import
annotations" since the project targets Python 3.10+ and PEP 604/modern type
hints are supported natively; ensure no other code changes are needed because
annotations can remain as-is.

146-149: Comparison may produce unexpected results when list was already sorted.

The condition all_lines == existing_lines compares a newly sorted list against existing_lines, which preserves the original file order. If the file was already sorted and had no missing entries, this comparison would correctly detect no changes. However, if the file was unsorted but complete, this would incorrectly trigger a rewrite just to sort.

This behavior seems intentional (ensuring the file is always sorted), but the "Nothing to change" message could be misleading since it only prints when both conditions are met.

Consider clarifying the intent or adjusting the message:

📝 Suggested clarification
     all_lines = sorted(set(existing_lines) | set(missing))
     if not missing and all_lines == existing_lines:
-        print("Nothing to change.")
+        print("Already up to date (no missing entries, already sorted).")
         return 0
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@jenkins/scripts/sync_qa_tests.py` around lines 146 - 149, The current check
uses all_lines = sorted(set(existing_lines) | set(missing)) and prints "Nothing
to change." only when not missing and all_lines == existing_lines, which
conflates “no missing entries” with “already sorted”; update logic or messaging
in sync_qa_tests.py so intent is clear: either (A) treat sorting as a change by
comparing sets only (e.g., compare set(existing_lines) vs set(all_lines)) and
keep the message as-is, or (B) preserve existing behavior but change the message
to indicate both “no missing entries and file already sorted” (referencing
variables all_lines, existing_lines, missing and the surrounding check) so the
output isn’t misleading. Ensure whichever option you pick updates the
conditional and the printed string accordingly.

305-305: Consider using parser.error() instead of AssertionError.

Since required=True is set on the subparsers, this line should never be reached. However, if it somehow is, parser.error() would provide a more user-friendly error message than AssertionError.

That said, this is defensive code for an unreachable path, so keeping AssertionError is acceptable.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@jenkins/scripts/sync_qa_tests.py` at line 305, Replace the defensive raise
AssertionError(f"unknown command: {args.command}") with a user-friendly
parser.error call: locate the unreachable fallback where args.command is handled
and replace the AssertionError with parser.error(f"unknown command:
{args.command}") so argparse prints a proper usage message; ensure you use the
existing parser variable and preserve the same error text for consistency.

62-96: YAML parsing uses regex—consider robustness for edge cases.

The extract_accuracy_disaggregated_from_yml function parses YAML using regex line-by-line rather than a proper YAML parser. This works for the current test-db format but could break if:

  • YAML uses multi-line strings or flow syntax
  • Indentation varies unexpectedly
  • Comments appear in unusual positions

For the current structured test-db files, this approach is pragmatic and avoids adding a pyyaml dependency. If test-db format evolves, consider revisiting.

.claude/skills/sync-qa-tests/SKILL.md (2)

61-61: Minor: Awkward phrasing with "only" repetition.

The phrase "today the file is gen_only disagg only" reads awkwardly due to "only" appearing twice.

Consider rephrasing to: "today the file covers gen_only disagg entries exclusively"

📝 Suggested fix
-   Other prefixes inside the bracket (`disagg-e2e-`, `aggr-...`, `aggr-ctx_only-`, etc.) exist in some DB files; **do not** auto-merge non–gen_only disagg lines into `llm_perf_multinode.yml` unless perf/QA owners explicitly want them in this union (today the file is **gen_only** disagg only).
+   Other prefixes inside the bracket (`disagg-e2e-`, `aggr-...`, `aggr-ctx_only-`, etc.) exist in some DB files; **do not** auto-merge non–gen_only disagg lines into `llm_perf_multinode.yml` unless perf/QA owners explicitly want them in this union (today the file covers **gen_only** disagg entries exclusively).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/skills/sync-qa-tests/SKILL.md at line 61, Replace the awkward phrase
"today the file is **gen_only** disagg only" in SKILL.md with clearer wording
such as "today the file covers **gen_only** disagg entries exclusively" to avoid
duplicate "only" and improve readability; locate the exact sentence containing
"Other prefixes... do not auto-merge non–gen_only disagg lines into
`llm_perf_multinode.yml` unless perf/QA owners explicitly want them in this
union (today the file is **gen_only** disagg only)" and update it to the
suggested rephrasing.

70-70: Minor: Remove leading space inside code span.

The inline code ` TIMEOUT (NNN)` has a leading space which triggers markdownlint MD038.

📝 Suggested fix
-   Preserve the same **` TIMEOUT (NNN)`** suffix as in test-db when present (commonly `120`). Keep list style consistent with existing entries (`  - perf/...`).
+   Preserve the same **`TIMEOUT (NNN)`** suffix as in test-db when present (commonly `120`). Keep list style consistent with existing entries (`  - perf/...`).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.claude/skills/sync-qa-tests/SKILL.md at line 70, The inline code span
contains a leading space ("` TIMEOUT (NNN)`") which triggers markdownlint MD038;
update the code span to remove the leading space so it reads "`TIMEOUT (NNN)`"
and ensure the surrounding list entry style remains consistent with existing
entries (e.g., "  - perf/...")—locate the offending text by searching for the
code span ` TIMEOUT (NNN)` in SKILL.md and replace it accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/integration/test_lists/qa/llm_perf_multinode.yml`:
- Around line 17-18: Update the QA node IDs to match the actual parametrization
naming: replace occurrences of "disagg-gen_only-gb200_<model>" with
"disagg_upload-gen_only-gb200_<model>" (e.g., for models deepseek-r1,
deepseek-v32, kimi-k2-thinking, gpt-oss-120b, qwen3-235b) so the QA entries
match the test parametrization keys used by the
l0_gb200_multi_nodes_perf_sanity* files; verify each listed node ID string now
uses "disagg_upload-gen_only-gb200_" exactly.

---

Nitpick comments:
In @.claude/skills/sync-qa-tests/SKILL.md:
- Line 61: Replace the awkward phrase "today the file is **gen_only** disagg
only" in SKILL.md with clearer wording such as "today the file covers
**gen_only** disagg entries exclusively" to avoid duplicate "only" and improve
readability; locate the exact sentence containing "Other prefixes... do not
auto-merge non–gen_only disagg lines into `llm_perf_multinode.yml` unless
perf/QA owners explicitly want them in this union (today the file is
**gen_only** disagg only)" and update it to the suggested rephrasing.
- Line 70: The inline code span contains a leading space ("` TIMEOUT (NNN)`")
which triggers markdownlint MD038; update the code span to remove the leading
space so it reads "`TIMEOUT (NNN)`" and ensure the surrounding list entry style
remains consistent with existing entries (e.g., "  - perf/...")—locate the
offending text by searching for the code span ` TIMEOUT (NNN)` in SKILL.md and
replace it accordingly.

In `@jenkins/scripts/sync_qa_tests.py`:
- Line 2: Update the SPDX header in sync_qa_tests.py so the copyright year range
matches the rest of the PR (use "2025-2026" instead of "2026"); locate the
top-of-file SPDX comment line (the string starting with "#
SPDX-FileCopyrightText: Copyright (c) ...") and change the year portion to the
range "2025-2026" to align with files like .coderabbit.yaml.
- Line 15: Remove the unnecessary future import at the top of sync_qa_tests.py:
delete the line "from __future__ import annotations" since the project targets
Python 3.10+ and PEP 604/modern type hints are supported natively; ensure no
other code changes are needed because annotations can remain as-is.
- Around line 146-149: The current check uses all_lines =
sorted(set(existing_lines) | set(missing)) and prints "Nothing to change." only
when not missing and all_lines == existing_lines, which conflates “no missing
entries” with “already sorted”; update logic or messaging in sync_qa_tests.py so
intent is clear: either (A) treat sorting as a change by comparing sets only
(e.g., compare set(existing_lines) vs set(all_lines)) and keep the message
as-is, or (B) preserve existing behavior but change the message to indicate both
“no missing entries and file already sorted” (referencing variables all_lines,
existing_lines, missing and the surrounding check) so the output isn’t
misleading. Ensure whichever option you pick updates the conditional and the
printed string accordingly.
- Line 305: Replace the defensive raise AssertionError(f"unknown command:
{args.command}") with a user-friendly parser.error call: locate the unreachable
fallback where args.command is handled and replace the AssertionError with
parser.error(f"unknown command: {args.command}") so argparse prints a proper
usage message; ensure you use the existing parser variable and preserve the same
error text for consistency.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 173b8db8-5944-4c20-b8d3-c6b54eb65e2c

📥 Commits

Reviewing files that changed from the base of the PR and between f2f9051 and 4d142b1.

📒 Files selected for processing (7)
  • .claude/skills/sync-qa-tests/SKILL.md
  • .coderabbit.yaml
  • jenkins/scripts/sync_qa_tests.py
  • tests/integration/test_lists/qa/llm_function_core.txt
  • tests/integration/test_lists/qa/llm_function_core_sanity.txt
  • tests/integration/test_lists/qa/llm_perf_multinode.yml
  • tests/integration/test_lists/test-db/l0_a10.yml
💤 Files with no reviewable changes (1)
  • tests/integration/test_lists/test-db/l0_a10.yml

Comment thread tests/integration/test_lists/qa/llm_perf_multinode.yml Outdated
@xinhe-nv xinhe-nv changed the title [None][chore] Add sync_qa_tests Jenkins script and sync-qa-tests skill [None][test] Add sync_qa_tests Jenkins script and sync-qa-tests skill Apr 14, 2026
@xinhe-nv
Copy link
Copy Markdown
Collaborator Author

/bot run --stage-list ""

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43408 [ run ] triggered by Bot. Commit: 0271271 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43408 [ run ] completed with state FAILURE. Commit: 0271271
/LLM/main/L0_MergeRequest_PR pipeline #33942 (Partly Tested) completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

@xinhe-nv
Copy link
Copy Markdown
Collaborator Author

/bot run --stage-list ""

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43470 [ run ] triggered by Bot. Commit: 150ce21 Link to invocation

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43470 [ run ] completed with state FAILURE. Commit: 150ce21
/LLM/main/L0_MergeRequest_PR pipeline #33989 (Partly Tested) completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

Link to invocation

Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
@xinhe-nv
Copy link
Copy Markdown
Collaborator Author

/bot run --stage-list ""

@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43607 [ run ] triggered by Bot. Commit: 6a28f49 Link to invocation

Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
@tensorrt-cicd
Copy link
Copy Markdown
Collaborator

PR_Github #43607 [ run ] completed with state SUCCESS. Commit: 6a28f49
/LLM/main/L0_MergeRequest_PR pipeline #34100 (Partly Tested) completed with status: 'SUCCESS'

CI Report

Link to invocation

Add test_openai_consistent_chat and test_ptp_quickstart_advanced_multi_gpus
which are referenced in waives.txt but were missing from the QA list.

Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants