CNTRLPLANE-3329: Add nightly CronJob to warm EFS-backed Go build cache#8496
Conversation
Add a Kubernetes CronJob that runs at 2 AM UTC daily to populate the shared Go build cache on the EFS-backed PV. The job clones main, compiles all packages and test binaries, then syncs the build cache to the PV for runner pods to consume as a read-only mount. Part-of: CNTRLPLANE-3329
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
@vismishr: This pull request references CNTRLPLANE-3329 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Skipping CI for Draft Pull Request. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository YAML (base), Central YAML (inherited) Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughA new Kubernetes CronJob Sequence Diagram(s)sequenceDiagram
participant Scheduler as CronJob Scheduler
participant KubeAPI as Kubernetes API/Controller
participant Pod as Job Pod (container)
participant Git as Git Repo (openshift/hypershift)
participant PVC as PersistentVolume (go-cache-pvc)
Scheduler->>KubeAPI: trigger job (02:00 daily)
KubeAPI->>Pod: create Job Pod (concurrency: Forbid, restartPolicy: Never)
Pod->>Git: shallow clone main -> /tmp/hypershift
Pod->>Pod: run go build ./... and attempt go test -c (populate GOCACHE)
Pod->>Pod: prune /cache/go-build files older than 7 days
Pod->>PVC: copy ${GOCACHE} -> /cache/go-build (mounted PVC)
Pod->>KubeAPI: exit and report status
🚥 Pre-merge checks | ✅ 12✅ Passed checks (12 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #8496 +/- ##
=======================================
Coverage 39.95% 39.95%
=======================================
Files 751 751
Lines 92733 92733
=======================================
Hits 37048 37048
Misses 52998 52998
Partials 2687 2687
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
hack/github-actions-runner/cache-warming-cronjob.yaml (1)
23-23: ⚡ Quick winPin the runner image by digest instead of
latest.Using a mutable tag makes cache behavior non-reproducible across runs and can introduce unreviewed runtime changes. The codebase already establishes this pattern elsewhere (e.g.,
hack/github-actions-runner/monitoring/grafana.yamlpins by digest, and many production images inhypershift-operator/use digest pinning). Consider updating toquay.io/redhat-user-workloads/crt-redhat-acm-tenant/hypershift-gh-actions-runner@sha256:<digest>.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@hack/github-actions-runner/cache-warming-cronjob.yaml` at line 23, Replace the mutable image tag "quay.io/redhat-user-workloads/crt-redhat-acm-tenant/hypershift-gh-actions-runner:latest" with a digest-pinned reference using the `@sha256`:<digest> form (e.g. quay.io/.../hypershift-gh-actions-runner@sha256:<digest>); fetch the current published sha256 for that repo (via quay/skopeo/registry API) and substitute it so the CronJob uses an immutable image reference consistent with other manifests.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@hack/github-actions-runner/cache-warming-cronjob.yaml`:
- Around line 40-43: Replace the destructive "rm -rf /cache/go-build/*" + "cp -a
\"${GOCACHE}\"/* /cache/go-build/" sequence with a safe atomic swap: copy into a
temporary staging directory (e.g. /cache/go-build.tmp), verify the copy
succeeded, then rename/move the temp dir into place (atomic mv to replace
/cache/go-build) and clean up on failure; ensure the CI cronjob uses the temp
dir and atomic rename so readers never see an empty/partial /cache/go-build
during the update.
---
Nitpick comments:
In `@hack/github-actions-runner/cache-warming-cronjob.yaml`:
- Line 23: Replace the mutable image tag
"quay.io/redhat-user-workloads/crt-redhat-acm-tenant/hypershift-gh-actions-runner:latest"
with a digest-pinned reference using the `@sha256`:<digest> form (e.g.
quay.io/.../hypershift-gh-actions-runner@sha256:<digest>); fetch the current
published sha256 for that repo (via quay/skopeo/registry API) and substitute it
so the CronJob uses an immutable image reference consistent with other
manifests.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: e2b9174f-9b0a-4e9f-aacb-a3a7ab0da805
📒 Files selected for processing (1)
hack/github-actions-runner/cache-warming-cronjob.yaml
Replace rm -rf of the entire cache directory with find -mtime +7 to prune entries older than 7 days. This prevents runners from seeing an empty or partial cache if they start during the nightly sync window. Stale entries are cleaned incrementally while fresh entries are overwritten in place by cp -a, keeping the cache always populated.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@hack/github-actions-runner/cache-warming-cronjob.yaml`:
- Around line 19-23: The Pod spec for the cache-warmer job lacks hardened
securityContext; update the Pod-level spec and the container "cache-warmer"
securityContext to enforce non-root and least privilege: set pod-level
runAsNonRoot: true and runAsUser (e.g., 1000), and in the container
securityContext set allowPrivilegeEscalation: false, privileged: false,
readOnlyRootFilesystem: true, runAsNonRoot: true, drop all capabilities
(capabilities.drop: ["ALL"]) and enable seccompProfile with type RuntimeDefault
(or LocalhostProfile if required by cluster). Apply these changes to the cronjob
block that defines the container named "cache-warmer" (and replicate to the
other similar block referenced lines 49-66).
- Line 23: The cronjob is using a mutable image tag
"quay.io/redhat-user-workloads/crt-redhat-acm-tenant/hypershift-gh-actions-runner:latest";
resolve the image digest (for example via `skopeo inspect
docker://quay.io/...:latest` or `podman pull` + `podman inspect --format
'{{index .Digest}}'`), then replace the tag with the immutable digest form
`quay.io/redhat-user-workloads/crt-redhat-acm-tenant@sha256:<DIGEST>` in the
image field so the cron execution is reproducible.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: c578c230-3d3e-4b1e-836c-6c30491409f3
📒 Files selected for processing (1)
hack/github-actions-runner/cache-warming-cronjob.yaml
| restartPolicy: Never | ||
| containers: | ||
| - name: cache-warmer | ||
| image: quay.io/redhat-user-workloads/crt-redhat-acm-tenant/hypershift-gh-actions-runner:latest |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Check if the file exists and examine its structure
find . -name "cache-warming-cronjob.yaml" -type fRepository: openshift/hypershift
Length of output: 120
🏁 Script executed:
# If found, examine the file around line 23
if [ -f "hack/github-actions-runner/cache-warming-cronjob.yaml" ]; then
echo "=== File exists. Examining contents ===" && \
cat -n "hack/github-actions-runner/cache-warming-cronjob.yaml" | head -40
else
echo "File not found at specified path"
fiRepository: openshift/hypershift
Length of output: 1659
Avoid mutable :latest for cron execution reproducibility.
Pin this image to a digest to prevent unreviewed runtime drift in nightly jobs. Resolve the actual digest value and update the image reference.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@hack/github-actions-runner/cache-warming-cronjob.yaml` at line 23, The
cronjob is using a mutable image tag
"quay.io/redhat-user-workloads/crt-redhat-acm-tenant/hypershift-gh-actions-runner:latest";
resolve the image digest (for example via `skopeo inspect
docker://quay.io/...:latest` or `podman pull` + `podman inspect --format
'{{index .Digest}}'`), then replace the tag with the immutable digest form
`quay.io/redhat-user-workloads/crt-redhat-acm-tenant@sha256:<DIGEST>` in the
image field so the cron execution is reproducible.
There was a problem hiding this comment.
Intentionally using :latest here — the CronJob needs to stay in sync with the ARC runner image so the Go build cache is compiled with the same Go version and toolchain the runners use. Pinning to a digest would require a manual update every time the runner image is rebuilt.
There was a problem hiding this comment.
@vismishr, that's a fair point — since the build cache must be ABI-compatible with the exact toolchain the runners use, keeping both on :latest is a sensible way to guarantee that alignment without manual digest chasing on every runner rebuild. Understood, I'll stand down on this one.
✏️ Learnings added
Learnt from: vismishr
Repo: openshift/hypershift PR: 8496
File: hack/github-actions-runner/cache-warming-cronjob.yaml:23-23
Timestamp: 2026-05-12T19:47:18.581Z
Learning: In `hack/github-actions-runner/cache-warming-cronjob.yaml` (openshift/hypershift), the CronJob `go-cache-warmer` intentionally uses the mutable `:latest` tag for `quay.io/redhat-user-workloads/crt-redhat-acm-tenant/hypershift-gh-actions-runner:latest`. This is by design so the Go build cache is always compiled with the same Go version and toolchain as the ARC runner pods, which also track `:latest`. Pinning to a digest is not desired because it would require a manual update on every runner image rebuild.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
♻️ Duplicate comments (2)
hack/github-actions-runner/cache-warming-cronjob.yaml (2)
23-23:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPin image to immutable digest for reproducibility.
The
:latesttag remains mutable, so the nightly CronJob can pull unreviewed runtime changes. For stable, reproducible execution, resolve and pin to the image digest.Run the following script to resolve the current digest:
#!/bin/bash # Resolve the digest for the latest runner image skopeo inspect --format '{{.Digest}}' \ docker://quay.io/redhat-user-workloads/crt-redhat-acm-tenant/hypershift-gh-actions-runner:latestThen update the image field to:
image: quay.io/redhat-user-workloads/crt-redhat-acm-tenant/hypershift-gh-actions-runner@sha256:<DIGEST>🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@hack/github-actions-runner/cache-warming-cronjob.yaml` at line 23, The CronJob image is using the mutable tag "quay.io/redhat-user-workloads/crt-redhat-acm-tenant/hypershift-gh-actions-runner:latest"; resolve the current immutable digest (e.g., via skopeo inspect) and update the image field to the digest form using the same image name but with `@sha256`:<DIGEST> instead of :latest so the nightly CronJob pins to a reproducible image; change the image line that currently contains "quay.io/redhat-user-workloads/crt-redhat-acm-tenant/hypershift-gh-actions-runner:latest" to the corresponding "@sha256:<DIGEST>" value.
18-22:⚠️ Potential issue | 🟠 Major | ⚡ Quick winHarden pod and container security context.
The pod and container still lack explicit security constraints. Without
runAsNonRoot,allowPrivilegeEscalation: false, seccomp profile, and capability drops, the workload runs with weaker baseline security.As per coding guidelines, static analysis flags CKV_K8S_20 (allowPrivilegeEscalation) and CKV_K8S_23 (root containers).
🔒 Recommended hardening patch
template: spec: + securityContext: + runAsNonRoot: true + runAsUser: 1000 + seccompProfile: + type: RuntimeDefault restartPolicy: Never containers: - name: cache-warmer image: quay.io/redhat-user-workloads/crt-redhat-acm-tenant/hypershift-gh-actions-runner:latest + securityContext: + allowPrivilegeEscalation: false + runAsNonRoot: true + capabilities: + drop: + - ALL command:🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@hack/github-actions-runner/cache-warming-cronjob.yaml` around lines 18 - 22, The pod spec under template.spec for the cache-warmer workload is missing hardened security settings; update the PodTemplate to include a podSecurityContext (e.g., runAsNonRoot: true, runAsUser: <non-root uid>, fsGroup: <gid>) and add a securityContext on the container named "cache-warmer" to set runAsNonRoot: true (and runAsUser matching the pod if needed), allowPrivilegeEscalation: false, readOnlyRootFilesystem: true, capabilities: drop: ["ALL"], and a seccompProfile (type: RuntimeDefault) to satisfy CKV_K8S_20/CKV_K8S_23 policies and prevent the container from running as root or escalating privileges.
🧹 Nitpick comments (1)
hack/github-actions-runner/cache-warming-cronjob.yaml (1)
44-45: ⚡ Quick winConsider atomic cache update for consistency.
Good: the previous
rm -rf /cache/go-build/*has been removed, eliminating the risk of leaving the cache empty on copy failure.However, readers may still observe a mix of old and new files during the multi-file copy. For stricter consistency, stage the new cache into a temporary directory and atomically rename it into place.
♻️ Optional atomic swap pattern
echo "=== Syncing build cache to PV ===" + rm -rf /cache/go-build.new /cache/go-build.old + mkdir -p /cache/go-build.new + cp -a "${GOCACHE}"/. /cache/go-build.new/ + if [ -d /cache/go-build ]; then + mv /cache/go-build /cache/go-build.old + fi + mv /cache/go-build.new /cache/go-build + rm -rf /cache/go-build.old - cp -a "${GOCACHE}"/* /cache/go-build/🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@hack/github-actions-runner/cache-warming-cronjob.yaml` around lines 44 - 45, Staging the new Go build cache into place atomically will avoid readers seeing mixed old/new files: when syncing from "${GOCACHE}" to /cache/go-build, create a temporary directory (e.g., via mktemp -d), copy into that temp dir using the same cp -a invocation, then atomically replace the target by renaming the temp dir into /cache/go-build (use mv -T or mv then appropriate cleanup) so the swap is instantaneous; update the cache-warming script around the cp -a "${GOCACHE}"/* /cache/go-build/ step to implement this temp-dir -> mv swap and ensure proper cleanup on error.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@hack/github-actions-runner/cache-warming-cronjob.yaml`:
- Line 23: The CronJob image is using the mutable tag
"quay.io/redhat-user-workloads/crt-redhat-acm-tenant/hypershift-gh-actions-runner:latest";
resolve the current immutable digest (e.g., via skopeo inspect) and update the
image field to the digest form using the same image name but with
`@sha256`:<DIGEST> instead of :latest so the nightly CronJob pins to a
reproducible image; change the image line that currently contains
"quay.io/redhat-user-workloads/crt-redhat-acm-tenant/hypershift-gh-actions-runner:latest"
to the corresponding "@sha256:<DIGEST>" value.
- Around line 18-22: The pod spec under template.spec for the cache-warmer
workload is missing hardened security settings; update the PodTemplate to
include a podSecurityContext (e.g., runAsNonRoot: true, runAsUser: <non-root
uid>, fsGroup: <gid>) and add a securityContext on the container named
"cache-warmer" to set runAsNonRoot: true (and runAsUser matching the pod if
needed), allowPrivilegeEscalation: false, readOnlyRootFilesystem: true,
capabilities: drop: ["ALL"], and a seccompProfile (type: RuntimeDefault) to
satisfy CKV_K8S_20/CKV_K8S_23 policies and prevent the container from running as
root or escalating privileges.
---
Nitpick comments:
In `@hack/github-actions-runner/cache-warming-cronjob.yaml`:
- Around line 44-45: Staging the new Go build cache into place atomically will
avoid readers seeing mixed old/new files: when syncing from "${GOCACHE}" to
/cache/go-build, create a temporary directory (e.g., via mktemp -d), copy into
that temp dir using the same cp -a invocation, then atomically replace the
target by renaming the temp dir into /cache/go-build (use mv -T or mv then
appropriate cleanup) so the swap is instantaneous; update the cache-warming
script around the cp -a "${GOCACHE}"/* /cache/go-build/ step to implement this
temp-dir -> mv swap and ensure proper cleanup on error.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 36dfb90f-6826-4cb1-ab4c-542e261d2431
📒 Files selected for processing (1)
hack/github-actions-runner/cache-warming-cronjob.yaml
Add pod-level and container-level security contexts to harden the CronJob: runAsNonRoot, seccomp RuntimeDefault, drop all capabilities, and disallow privilege escalation.
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@hack/github-actions-runner/cache-warming-cronjob.yaml`:
- Line 51: The find command can remove the mounted /cache/go-build directory
itself when empty and cause the job to fail under set -e; update the command
that currently reads "find /cache/go-build -type d -empty -delete" to restrict
deletion to children only by adding -mindepth 1 (e.g. "find /cache/go-build
-mindepth 1 -type d -empty -delete") so the mount root is never removed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: d8b109ed-2661-46cf-9f8c-45de5957b481
📒 Files selected for processing (1)
hack/github-actions-runner/cache-warming-cronjob.yaml
Add -mindepth 1 to prevent find from attempting to delete the /cache/go-build mount point itself when it is empty.
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
|
Scheduling tests matching the |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bryan-cox, vismishr The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Test Resultse2e-aws
e2e-aks
|
|
/retest |
1 similar comment
|
/retest |
|
The PR only changes a YAML file for a cache-warming CronJob — completely unrelated to the Test Failure Analysis CompleteJob Information
Test Failure AnalysisErrorSummaryAll 8 actual e2e tests passed successfully (4 public, 2 private, 2 OAuth LB — all SUCCESS in 31 seconds). The job failure is caused exclusively by a CI infrastructure issue in the post phase: the Root CauseThe root cause is a transient CI infrastructure image pull failure — specifically, the Detailed timeline:
The This failure is not caused by PR #8496, which only adds a cache-warming CronJob YAML file ( Recommendations
Evidence
|
|
/verified bypass Hack change only |
|
@bryan-cox: The DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@vismishr: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
1eddaf8
into
openshift:main
What this PR does / why we need it:
The EFS-backed PV mounted on ARC runner pods is empty until something
populates it. This PR adds a Kubernetes CronJob that runs nightly at
2 AM UTC to compile all Go packages and test binaries from
main, thensyncs the resulting build cache to the shared PV.
Once populated, every CI run copies from this pre-warmed cache instead
of compiling from scratch, eliminating ~2-3.5 min of overhead per shard.
How it works:
openshift/hypershiftmain branch (shallow, depth 1)go build ./...andgo test -c ./...to compile everythingCronJob details:
0 2 * * *(2 AM UTC daily)Forbid(no overlapping runs)Which issue(s) this PR fixes:
Fixes https://redhat.atlassian.net/browse/CNTRLPLANE-3329
Special notes for your reviewer:
PRs (CNTRLPLANE-3329: Mount EFS-backed Go build cache PV on ARC runner pods #8493, CNTRLPLANE-3329: Replace actions/cache with EFS-backed build cache in unit tests #8494, CNTRLPLANE-3329: Extend EFS-backed build cache to lint, verify, and envtest workflows #8495). In fact, it should be applied first so the
cache is warm before runners start using it.
oc apply -f hack/github-actions-runner/cache-warming-cronjob.yamloc create job --from=cronjob/go-cache-warmer go-cache-warmer-manual -n arc-runnersChecklist:
Summary by CodeRabbit