Skip to content

fix(ci): decouple k8s tests from cluster job via docker image artifact#83

Merged
powderluv merged 1 commit intoROCm:mainfrom
shiv-tyagi:fix/k8s-ci-docker-image
Apr 14, 2026
Merged

fix(ci): decouple k8s tests from cluster job via docker image artifact#83
powderluv merged 1 commit intoROCm:mainfrom
shiv-tyagi:fix/k8s-ci-docker-image

Conversation

@shiv-tyagi
Copy link
Copy Markdown
Member

@shiv-tyagi shiv-tyagi commented Apr 14, 2026

The k8s integration step assumed binaries from the cluster job would be at ~/spur/bin/ on the same runner. Both jobs running on the same runner back to back is not guaranteed like another job can come in between the two jobs, causing intermittent failures. (I experienced this lately)

Fix: add a build-docker-image job that builds via deploy/Dockerfile, uploads the image as an artifact, and the k8s job downloads and uses it. k8s_test.sh now requires SPUR_CI_IMAGE instead of host binaries.

This also means the docker image build is validated on every CI run.

The k8s integration step assumed binaries from the cluster job would
be at ~/spur/bin/ on the same runner. Both jobs running on the same
runner back to back is not guaranteed, another job can come in between
or a job can land on a different runner, causing intermittent failures.

Fix: add a build-docker-image job that builds via deploy/Dockerfile,
uploads the image as an artifact, and the k8s job downloads and uses
it. k8s_test.sh now requires SPUR_CI_IMAGE instead of host binaries.

This also means the docker image build is validated on every CI run.

Made-with: Cursor
@shiv-tyagi shiv-tyagi marked this pull request as draft April 14, 2026 10:30
@shiv-tyagi shiv-tyagi force-pushed the fix/k8s-ci-docker-image branch from 510a994 to ee77264 Compare April 14, 2026 10:35
@shiv-tyagi shiv-tyagi marked this pull request as ready for review April 14, 2026 10:49
@shiv-tyagi
Copy link
Copy Markdown
Member Author

https://github.com/ROCm/spur/actions/runs/24402030397

This failure is an example of the problem we have. This job ran on stale binaries.

@shiv-tyagi shiv-tyagi requested a review from powderluv April 14, 2026 14:23
@powderluv powderluv merged commit 5db7e51 into ROCm:main Apr 14, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants