Consolidate CI test jobs: merge GPU smoke test and add Python version matrix#8
Merged
ivanbasov merged 14 commits intoMar 6, 2026
Merged
Conversation
… matrix
- Remove separate smoke-test-gpu job (was serial after gpu-tests, increasing
pipeline time). Smoke training+inference now runs in the same gpu-tests job.
- Replace python-compat matrix (6 jobs, SKIP_TESTS=1) with two focused job
groups that actually run tests:
* gpu-tests: matrix over Python 3.11/3.12/3.13 on GPU runners — installs
train deps, runs full test suite (CPU+GPU), then smoke training+inference.
* inference-tests: matrix over Python 3.11/3.12/3.13 on CPU — installs
inference deps, runs tests with pre-trained models (GPU tests auto-skip).
Reduces total jobs from 11 to 9 while increasing actual test coverage.
Made-with: Cursor
The deadsnakes PPA pulls in tzdata as a dependency, which triggers an interactive timezone configuration prompt in the container. This caused all 3 GPU matrix jobs to hang for 45 minutes until timeout. Made-with: Cursor
Without the pull_request trigger, CI never fires on PRs — checks aren't even planned (e.g. PR #9 shows zero checks). GPU jobs are gated to push/merge_group events to avoid consuming self-hosted GPU runners on every PR update. Made-with: Cursor
GPU jobs complete in ~5-10 minutes and serve as a useful pre-merge check. Made-with: Cursor
The copy-pr-bot creates pull-request/N branches for each PR, which matched the push trigger and caused every CI job to run twice (once from pull_request, once from push). The pull_request trigger already covers PRs targeting main, so the push pattern is redundant. Made-with: Cursor
NVIDIA self-hosted runners block pull_request events outright. GPU CI must run via push events — either to main or to pull-request/[0-9]+ branches created by copy-pr-bot for PR testing. - Restore "pull-request/[0-9]+" in push trigger - Gate gpu-tests with if: github.event_name != 'pull_request' - CPU jobs (inference-tests, unit-tests, etc.) still run on pull_request Made-with: Cursor
…jobs Simplify triggers: all jobs (including GPU) run on pull_request, push to main, and merge_group. The pull-request/[0-9]+ branch convention is not used by contributors. Made-with: Cursor
- Combine unit-tests (py3.12) and inference-tests (py3.11/3.12/3.13) into a single unit-tests matrix job across all three Python versions. Both ran identical test suites with inference requirements. - Re-add if: github.event_name != 'pull_request' on gpu-tests since NVIDIA self-hosted runners block pull_request events entirely. GPU CI runs on push to main and merge_group. Made-with: Cursor
NVIDIA self-hosted runners block pull_request events, so GPU jobs in the main CI workflow always showed as a single "Skipped" entry with unresolved matrix names on every PR. Move GPU jobs to ci-gpu.yml (triggers: push to main, merge_group, workflow_dispatch). The main ci.yml keeps CPU jobs only (triggers: pull_request, push to main, merge_group, workflow_dispatch). Made-with: Cursor
Add pull-request/[0-9]+ to ci-gpu.yml push trigger so GPU tests run when copy-pr-bot creates the corresponding branch for a PR. Made-with: Cursor
f8d312e to
932a3e8
Compare
The container default shell is sh, which doesn't have the source builtin. Explicitly set shell: bash for the venv activation step. Made-with: Cursor
The smoke training step uses torch.compile which invokes the inductor backend, requiring a C compiler. The ubuntu:22.04 container doesn't ship with gcc. Made-with: Cursor
bmhowe23
reviewed
Mar 5, 2026
Collaborator
bmhowe23
left a comment
There was a problem hiding this comment.
This LGTM, but be advised we can also use nv-cpu-general for CPU runner jobs. In some sense, this is preferable because it uses Nvidia CPU runners.
bmhowe23
approved these changes
Mar 5, 2026
Collaborator
bmhowe23
left a comment
There was a problem hiding this comment.
Oh, I meant to approve, so resubmitting now.
Use nv-cpu-general runner group instead of GitHub-hosted ubuntu-latest. Also restore pull-request/[0-9]+ push trigger in case self-hosted CPU runners block pull_request events (same as GPU runners). Made-with: Cursor
NVIDIA self-hosted runners block pull_request events. All CI (CPU and GPU) now runs via copy-pr-bot push to pull-request/[0-9]+ branches. Made-with: Cursor
ivanbasov
added a commit
that referenced
this pull request
Apr 10, 2026
… matrix (#8) * Consolidate CI test jobs: merge GPU smoke test and add Python version matrix - Remove separate smoke-test-gpu job (was serial after gpu-tests, increasing pipeline time). Smoke training+inference now runs in the same gpu-tests job. - Replace python-compat matrix (6 jobs, SKIP_TESTS=1) with two focused job groups that actually run tests: * gpu-tests: matrix over Python 3.11/3.12/3.13 on GPU runners — installs train deps, runs full test suite (CPU+GPU), then smoke training+inference. * inference-tests: matrix over Python 3.11/3.12/3.13 on CPU — installs inference deps, runs tests with pre-trained models (GPU tests auto-skip). Reduces total jobs from 11 to 9 while increasing actual test coverage. Made-with: Cursor * Fix GPU CI: set DEBIAN_FRONTEND=noninteractive to prevent tzdata hang The deadsnakes PPA pulls in tzdata as a dependency, which triggers an interactive timezone configuration prompt in the container. This caused all 3 GPU matrix jobs to hang for 45 minutes until timeout. Made-with: Cursor * Add pull_request trigger and gate GPU jobs to push/merge_group only Without the pull_request trigger, CI never fires on PRs — checks aren't even planned (e.g. PR #9 shows zero checks). GPU jobs are gated to push/merge_group events to avoid consuming self-hosted GPU runners on every PR update. Made-with: Cursor * Remove event gate on GPU jobs so they run on PRs too GPU jobs complete in ~5-10 minutes and serve as a useful pre-merge check. Made-with: Cursor * Remove pull-request/[0-9]+ from push trigger to fix duplicate CI runs The copy-pr-bot creates pull-request/N branches for each PR, which matched the push trigger and caused every CI job to run twice (once from pull_request, once from push). The pull_request trigger already covers PRs targeting main, so the push pattern is redundant. Made-with: Cursor * Fix GPU CI: gate on event type, restore push trigger for copy-pr-bot NVIDIA self-hosted runners block pull_request events outright. GPU CI must run via push events — either to main or to pull-request/[0-9]+ branches created by copy-pr-bot for PR testing. - Restore "pull-request/[0-9]+" in push trigger - Gate gpu-tests with if: github.event_name != 'pull_request' - CPU jobs (inference-tests, unit-tests, etc.) still run on pull_request Made-with: Cursor * Remove pull-request/[0-9]+ push pattern and pull_request gate on GPU jobs Simplify triggers: all jobs (including GPU) run on pull_request, push to main, and merge_group. The pull-request/[0-9]+ branch convention is not used by contributors. Made-with: Cursor * Merge unit-tests + inference-tests, gate GPU jobs from pull_request - Combine unit-tests (py3.12) and inference-tests (py3.11/3.12/3.13) into a single unit-tests matrix job across all three Python versions. Both ran identical test suites with inference requirements. - Re-add if: github.event_name != 'pull_request' on gpu-tests since NVIDIA self-hosted runners block pull_request events entirely. GPU CI runs on push to main and merge_group. Made-with: Cursor * Split GPU tests into separate workflow to avoid skipped PR noise NVIDIA self-hosted runners block pull_request events, so GPU jobs in the main CI workflow always showed as a single "Skipped" entry with unresolved matrix names on every PR. Move GPU jobs to ci-gpu.yml (triggers: push to main, merge_group, workflow_dispatch). The main ci.yml keeps CPU jobs only (triggers: pull_request, push to main, merge_group, workflow_dispatch). Made-with: Cursor * Enable GPU CI on PRs via copy-pr-bot push trigger Add pull-request/[0-9]+ to ci-gpu.yml push trigger so GPU tests run when copy-pr-bot creates the corresponding branch for a PR. Made-with: Cursor * Fix smoke test step: use bash shell for source command The container default shell is sh, which doesn't have the source builtin. Explicitly set shell: bash for the venv activation step. Made-with: Cursor * Install gcc in GPU container for torch.compile/inductor The smoke training step uses torch.compile which invokes the inductor backend, requiring a C compiler. The ubuntu:22.04 container doesn't ship with gcc. Made-with: Cursor * Switch CPU jobs to NVIDIA self-hosted linux-amd64-cpu4 runners Use nv-cpu-general runner group instead of GitHub-hosted ubuntu-latest. Also restore pull-request/[0-9]+ push trigger in case self-hosted CPU runners block pull_request events (same as GPU runners). Made-with: Cursor * Remove pull_request trigger since all runners are NVIDIA self-hosted NVIDIA self-hosted runners block pull_request events. All CI (CPU and GPU) now runs via copy-pr-bot push to pull-request/[0-9]+ branches. Made-with: Cursor
ivanbasov
added a commit
that referenced
this pull request
Apr 10, 2026
… matrix (#8) * Consolidate CI test jobs: merge GPU smoke test and add Python version matrix - Remove separate smoke-test-gpu job (was serial after gpu-tests, increasing pipeline time). Smoke training+inference now runs in the same gpu-tests job. - Replace python-compat matrix (6 jobs, SKIP_TESTS=1) with two focused job groups that actually run tests: * gpu-tests: matrix over Python 3.11/3.12/3.13 on GPU runners — installs train deps, runs full test suite (CPU+GPU), then smoke training+inference. * inference-tests: matrix over Python 3.11/3.12/3.13 on CPU — installs inference deps, runs tests with pre-trained models (GPU tests auto-skip). Reduces total jobs from 11 to 9 while increasing actual test coverage. Made-with: Cursor * Fix GPU CI: set DEBIAN_FRONTEND=noninteractive to prevent tzdata hang The deadsnakes PPA pulls in tzdata as a dependency, which triggers an interactive timezone configuration prompt in the container. This caused all 3 GPU matrix jobs to hang for 45 minutes until timeout. Made-with: Cursor * Add pull_request trigger and gate GPU jobs to push/merge_group only Without the pull_request trigger, CI never fires on PRs — checks aren't even planned (e.g. PR #9 shows zero checks). GPU jobs are gated to push/merge_group events to avoid consuming self-hosted GPU runners on every PR update. Made-with: Cursor * Remove event gate on GPU jobs so they run on PRs too GPU jobs complete in ~5-10 minutes and serve as a useful pre-merge check. Made-with: Cursor * Remove pull-request/[0-9]+ from push trigger to fix duplicate CI runs The copy-pr-bot creates pull-request/N branches for each PR, which matched the push trigger and caused every CI job to run twice (once from pull_request, once from push). The pull_request trigger already covers PRs targeting main, so the push pattern is redundant. Made-with: Cursor * Fix GPU CI: gate on event type, restore push trigger for copy-pr-bot NVIDIA self-hosted runners block pull_request events outright. GPU CI must run via push events — either to main or to pull-request/[0-9]+ branches created by copy-pr-bot for PR testing. - Restore "pull-request/[0-9]+" in push trigger - Gate gpu-tests with if: github.event_name != 'pull_request' - CPU jobs (inference-tests, unit-tests, etc.) still run on pull_request Made-with: Cursor * Remove pull-request/[0-9]+ push pattern and pull_request gate on GPU jobs Simplify triggers: all jobs (including GPU) run on pull_request, push to main, and merge_group. The pull-request/[0-9]+ branch convention is not used by contributors. Made-with: Cursor * Merge unit-tests + inference-tests, gate GPU jobs from pull_request - Combine unit-tests (py3.12) and inference-tests (py3.11/3.12/3.13) into a single unit-tests matrix job across all three Python versions. Both ran identical test suites with inference requirements. - Re-add if: github.event_name != 'pull_request' on gpu-tests since NVIDIA self-hosted runners block pull_request events entirely. GPU CI runs on push to main and merge_group. Made-with: Cursor * Split GPU tests into separate workflow to avoid skipped PR noise NVIDIA self-hosted runners block pull_request events, so GPU jobs in the main CI workflow always showed as a single "Skipped" entry with unresolved matrix names on every PR. Move GPU jobs to ci-gpu.yml (triggers: push to main, merge_group, workflow_dispatch). The main ci.yml keeps CPU jobs only (triggers: pull_request, push to main, merge_group, workflow_dispatch). Made-with: Cursor * Enable GPU CI on PRs via copy-pr-bot push trigger Add pull-request/[0-9]+ to ci-gpu.yml push trigger so GPU tests run when copy-pr-bot creates the corresponding branch for a PR. Made-with: Cursor * Fix smoke test step: use bash shell for source command The container default shell is sh, which doesn't have the source builtin. Explicitly set shell: bash for the venv activation step. Made-with: Cursor * Install gcc in GPU container for torch.compile/inductor The smoke training step uses torch.compile which invokes the inductor backend, requiring a C compiler. The ubuntu:22.04 container doesn't ship with gcc. Made-with: Cursor * Switch CPU jobs to NVIDIA self-hosted linux-amd64-cpu4 runners Use nv-cpu-general runner group instead of GitHub-hosted ubuntu-latest. Also restore pull-request/[0-9]+ push trigger in case self-hosted CPU runners block pull_request events (same as GPU runners). Made-with: Cursor * Remove pull_request trigger since all runners are NVIDIA self-hosted NVIDIA self-hosted runners block pull_request events. All CI (CPU and GPU) now runs via copy-pr-bot push to pull-request/[0-9]+ branches. Made-with: Cursor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Consolidates CI from 11 jobs down to 8, with better actual test coverage and cleaner PR experience.
Changes
smoke-test-gpuintogpu-tests: The smoke job previously waited forgpu-teststo finish (needs: gpu-tests) before starting its own setup cycle, adding serial overhead. Both now run in a single job.python-compat(6 jobs,SKIP_TESTS=1) with jobs that actually run tests:unit-tests(3 jobs, CPU): Matrix over Python 3.11/3.12/3.13. Installs inference deps and runs the full test suite with pre-trained models. GPU-specific tests auto-skip.gpu-tests(3 jobs, GPU): Matrix over Python 3.11/3.12/3.13. Installs train deps via deadsnakes PPA, runs full test suite (CPU + GPU), then smoke training + inference.ci-gpu.yml: NVIDIA self-hosted runners blockpull_requestevents entirely, which caused GPU matrix jobs to show as a single confusing "Skipped" entry on PRs. Separate workflow keeps PR checks clean.pull_requesttrigger toci.yml: CPU jobs now run on every PR targeting main (previously only ran on push to main).ci-gpu.ymltriggers on push topull-request/[0-9]+branches created by copy-pr-bot.Before (11 jobs, 1 workflow)
SKIP_TESTS=1— only checked importsSKIP_TESTS=1— only checked importsneeds:)After (8 jobs, 2 workflows)
ci.yml (runs on pull_request, push to main, merge_group):
ci-gpu.yml (runs on push to main / pull-request/[0-9]+, merge_group):
Test plan