feat(ci): add ho-release-gate pipeline for nightly promotion#8602
feat(ci): add ho-release-gate pipeline for nightly promotion#8602Nirshal wants to merge 10 commits into
Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughDefines the ho-release-gate Tekton Pipeline with two params: snapshot-name and ho-image. extract-image validates ho-image and emits ho-image and snapshot-name as task results. run-e2e consumes the emitted ho-image, writes a placeholder result of "passed" and a fixed prow-job-url. In finally, create-release runs only when run-e2e's result is "passed" and uses oc create to make a Konflux Release for the provided snapshot. Sequence Diagram(s)sequenceDiagram
participant ComponentA
participant ComponentB
ComponentA->>ComponentB: observable interaction
🚥 Pre-merge checks | ✅ 11✅ Passed checks (11 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
Skipping CI for Draft Pull Request. |
|
Please specify an area label DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
.tekton/pipelines/ho-release-gate.yaml (2)
91-102: ⚡ Quick winConsider adding a timeout to the polling loop.
The commented implementation polls indefinitely until success/failure. If the Prow job gets stuck or the API becomes unreachable, this could cause the pipeline to hang forever.
When implementing the actual gangway integration, add a maximum retry count or deadline:
♻️ Suggested pattern for timeout
+ MAX_ATTEMPTS=180 # 3 hours at 60s intervals + ATTEMPT=0 # Poll for completion: while true; do + ATTEMPT=$((ATTEMPT + 1)) + if [[ ${ATTEMPT} -gt ${MAX_ATTEMPTS} ]]; then + echo "ERROR: Timeout waiting for Prow job completion" + echo -n "failed" > $(results.result.path) + break + fi STATUS=$(curl -s "${GANGWAY_URL}/v1/executions/${JOB_URL}" \🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.tekton/pipelines/ho-release-gate.yaml around lines 91 - 102, The commented polling loop for checking Gangway job status (using GANGWAY_URL, JOB_URL, GANGWAY_TOKEN and writing to results.result.path) can hang indefinitely; update the loop to enforce a timeout by adding either a max retry counter or a deadline variable (e.g., MAX_RETRIES or GANGWAY_POLL_DEADLINE_SECONDS) and break with a failure result when exceeded; ensure the loop increments the counter or checks the deadline each iteration, logs a clear timeout error, and writes "failed" to results.result.path if the timeout is reached.
29-29: 💤 Low valueConsider pinning container image versions for reproducibility.
Multiple tasks use
:latesttags (lines 29, 62, 124, 164). For a release gate pipeline, unexpected image updates could cause inconsistent behavior or breakages. Pin to specific digests or version tags before removing the draft status.♻️ Example with pinned versions
- image: registry.redhat.io/openshift4/ose-cli:latest + image: registry.redhat.io/openshift4/ose-cli:v4.15Or use digest for stronger guarantees:
image: registry.redhat.io/openshift4/ose-cli@sha256:<digest>🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.tekton/pipelines/ho-release-gate.yaml at line 29, Replace occurrences of the image field using the :latest tag (e.g., "registry.redhat.io/openshift4/ose-cli:latest") with explicit, pinned version tags or immutable digests (e.g., "`@sha256`:...") to ensure reproducible builds; update every task that references the same image (the other occurrences of the same "image: registry.redhat.io/openshift4/ose-cli:latest" in this pipeline) to the chosen tag/digest and verify compatibility before merging.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.tekton/pipelines/ho-release-gate.yaml:
- Around line 112-153: The pipeline currently only handles explicit
$(tasks.run-e2e.results.result) values "passed" or "failed" so errors/skips
produce no notification; add a catch-all task (e.g., notify-error) or extend
notify-slack to inspect $(tasks.run-e2e.status) so non-Succeeded statuses
trigger a notification. Specifically, add a finally task (name: notify-error)
using when: input: $(tasks.run-e2e.status) operator: notin values: ["Succeeded"]
(and/or guard with $(tasks.run-e2e.results.result) notin ["passed","failed"])
that sends an alert, or update the existing notify-slack when clause to include
$(tasks.run-e2e.status) operator: in values: ["Failed"] so task
errors/timeouts/omissions are reported.
- Around line 165-171: The Slack JSON payload is built by interpolating params
directly in the shell script (the script block that posts to
"${SLACK_WEBHOOK_URL}"), which risks JSON injection if params like
$(params.ho-image), $(params.snapshot-name) or $(params.prow-job-url) contain
quotes or newlines; fix by constructing the JSON safely with a JSON tool (e.g.,
use jq -n --arg snapshot "$(params.snapshot-name)" --arg image
"$(params.ho-image)" --arg prow "$(params.prow-job-url)" '{text: "HyperShift
nightly promotion FAILED\nSnapshot: \($snapshot)\nImage: \($image)\nProw job:
\($prow)\nPipeline: $(context.pipelineRun.name)"}' ) so each param is properly
escaped and then pipe that output to curl instead of embedding parameters
directly in the here-doc.
---
Nitpick comments:
In @.tekton/pipelines/ho-release-gate.yaml:
- Around line 91-102: The commented polling loop for checking Gangway job status
(using GANGWAY_URL, JOB_URL, GANGWAY_TOKEN and writing to results.result.path)
can hang indefinitely; update the loop to enforce a timeout by adding either a
max retry counter or a deadline variable (e.g., MAX_RETRIES or
GANGWAY_POLL_DEADLINE_SECONDS) and break with a failure result when exceeded;
ensure the loop increments the counter or checks the deadline each iteration,
logs a clear timeout error, and writes "failed" to results.result.path if the
timeout is reached.
- Line 29: Replace occurrences of the image field using the :latest tag (e.g.,
"registry.redhat.io/openshift4/ose-cli:latest") with explicit, pinned version
tags or immutable digests (e.g., "`@sha256`:...") to ensure reproducible builds;
update every task that references the same image (the other occurrences of the
same "image: registry.redhat.io/openshift4/ose-cli:latest" in this pipeline) to
the chosen tag/digest and verify compatibility before merging.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: bd039c9f-b76d-4301-b332-b5f6622f4d01
📒 Files selected for processing (1)
.tekton/pipelines/ho-release-gate.yaml
| finally: | ||
| - name: create-release | ||
| when: | ||
| - input: $(tasks.run-e2e.results.result) | ||
| operator: in | ||
| values: ["passed"] | ||
| taskSpec: | ||
| params: | ||
| - name: snapshot-name | ||
| type: string | ||
| steps: | ||
| - name: create | ||
| image: registry.redhat.io/openshift4/ose-cli:latest | ||
| script: | | ||
| #!/bin/bash | ||
| set -euo pipefail | ||
|
|
||
| SNAPSHOT="$(params.snapshot-name)" | ||
| echo "Tests passed. Creating Release for snapshot: ${SNAPSHOT}" | ||
|
|
||
| oc create -f - <<EOF | ||
| apiVersion: appstudio.redhat.com/v1alpha1 | ||
| kind: Release | ||
| metadata: | ||
| generateName: hypershift-operator-ho-release-gate-aro-hcp- | ||
| namespace: crt-redhat-acm-tenant | ||
| spec: | ||
| snapshot: ${SNAPSHOT} | ||
| releasePlan: hypershift-operator-ho-release-gate-aro-hcp | ||
| gracePeriodDays: 7 | ||
| EOF | ||
|
|
||
| echo "Release created successfully" | ||
| params: | ||
| - name: snapshot-name | ||
| value: $(tasks.extract-image.results.snapshot-name) | ||
|
|
||
| - name: notify-slack | ||
| when: | ||
| - input: $(tasks.run-e2e.results.result) | ||
| operator: in | ||
| values: ["failed"] |
There was a problem hiding this comment.
No notification when run-e2e task errors or is skipped.
The when expressions only match explicit "passed" or "failed" results. If run-e2e errors before writing to $(results.result.path) (e.g., script crash, timeout, OOM), the result will be empty and neither create-release nor notify-slack will execute. The pipeline will complete silently with no indication of failure.
Consider adding a catch-all notification task or using $(tasks.run-e2e.status) to detect task failures:
🛡️ Suggested approach: Add error notification task
- name: notify-error
when:
- input: $(tasks.run-e2e.status)
operator: notin
values: ["Succeeded"]
- input: $(tasks.run-e2e.results.result)
operator: notin
values: ["passed", "failed"]
taskSpec:
steps:
- name: notify
image: registry.redhat.io/openshift4/ose-cli:latest
script: |
#!/bin/bash
# Send notification about task error/skip
echo "run-e2e task did not complete normally"
# Add Slack notification hereAlternatively, modify notify-slack to also trigger on task failure status:
when:
- input: $(tasks.run-e2e.status)
operator: in
values: ["Failed"]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.tekton/pipelines/ho-release-gate.yaml around lines 112 - 153, The pipeline
currently only handles explicit $(tasks.run-e2e.results.result) values "passed"
or "failed" so errors/skips produce no notification; add a catch-all task (e.g.,
notify-error) or extend notify-slack to inspect $(tasks.run-e2e.status) so
non-Succeeded statuses trigger a notification. Specifically, add a finally task
(name: notify-error) using when: input: $(tasks.run-e2e.status) operator: notin
values: ["Succeeded"] (and/or guard with $(tasks.run-e2e.results.result) notin
["passed","failed"]) that sends an alert, or update the existing notify-slack
when clause to include $(tasks.run-e2e.status) operator: in values: ["Failed"]
so task errors/timeouts/omissions are reported.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #8602 +/- ##
==========================================
+ Coverage 40.61% 41.43% +0.81%
==========================================
Files 755 756 +1
Lines 93227 93647 +420
==========================================
+ Hits 37864 38802 +938
+ Misses 52640 52124 -516
+ Partials 2723 2721 -2 see 52 files with indirect coverage changes
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: Nirshal The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
.tekton/pipelines/ho-release-gate.yaml (1)
230-234:⚠️ Potential issue | 🟠 Major | ⚡ Quick winHarden Slack webhook call with fail-fast and timeout controls.
The webhook POST can currently fail silently (non-2xx) or hang without bounds. Add curl failure/timeout/retry flags so pipeline outcome reflects notification delivery failures.
Proposed patch
- name: send-notification image: curlimages/curl:latest script: | #!/bin/sh - curl -X POST -H 'Content-type: application/json' \ + curl --fail --show-error --silent \ + --connect-timeout 10 --max-time 30 \ + --retry 3 --retry-delay 2 \ + -X POST -H 'Content-type: application/json' \ --data "{ \"text\": \"HyperShift nightly promotion FAILED\nSnapshot: $(params.snapshot-name)\nImage: $(params.ho-image)\nProw job: $(params.prow-job-url)\nPipeline: $(context.pipelineRun.name)\" }" \ "${SLACK_WEBHOOK_URL}"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.tekton/pipelines/ho-release-gate.yaml around lines 230 - 234, Update the curl invocation that posts to "${SLACK_WEBHOOK_URL}" (the Slack webhook step) to use robust failure, timeout, and retry flags so non-2xx responses and hangs produce a non-zero exit: add --fail --show-error --connect-timeout 5 --max-time 10 --retry 3 --retry-delay 2 --retry-connrefused to the existing curl command that posts the JSON payload (the block using params.snapshot-name, params.ho-image, params.prow-job-url and context.pipelineRun.name) so the pipeline reflects notification delivery failures.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In @.tekton/pipelines/ho-release-gate.yaml:
- Around line 230-234: Update the curl invocation that posts to
"${SLACK_WEBHOOK_URL}" (the Slack webhook step) to use robust failure, timeout,
and retry flags so non-2xx responses and hangs produce a non-zero exit: add
--fail --show-error --connect-timeout 5 --max-time 10 --retry 3 --retry-delay 2
--retry-connrefused to the existing curl command that posts the JSON payload
(the block using params.snapshot-name, params.ho-image, params.prow-job-url and
context.pipelineRun.name) so the pipeline reflects notification delivery
failures.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 0389e56d-fe3e-4fce-a4a9-7608f8e44e1b
📒 Files selected for processing (1)
.tekton/pipelines/ho-release-gate.yaml
e79809d to
58a3243
Compare
|
Actionable comments posted: 0 |
Adds a Tekton pipeline definition for the HyperShift Operator release gating pipeline (CNTRLPLANE-3434). The pipeline validates HO images via e2e tests before promoting them to a verified Quay repository. Pipeline flow: extract-image → run-e2e → push-image (skopeo) / notify-slack ARO HCP is the pilot platform. Current POC state: - run-e2e: simulated pass (gangway integration TBD) - push-image: direct skopeo copy to POC Quay repo - create-release (Konflux Release object) kept commented for final version Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
58a3243 to
bfb44b0
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
.tekton/pipelines/ho-release-gate.yaml (1)
112-118:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAdd a non-success notification path in
finally.Only the pass path is handled today. If
run-e2ereturnsfailedor errors before publishing results, there is no explicit failure notification task in this pipeline. Add a failure/error finalizer path keyed off task status/results.💡 Minimal fix pattern
finally: - name: create-release when: - input: $(tasks.run-e2e.results.result) operator: in values: ["passed"] @@ - name: snapshot-name value: $(tasks.extract-image.results.snapshot-name) + + - name: notify-failure + when: + - input: $(tasks.run-e2e.status) + operator: notin + values: ["Succeeded"] + taskSpec: + steps: + - name: notify + image: registry.redhat.io/openshift4/ose-cli:latest + script: | + #!/bin/bash + set -euo pipefail + echo "run-e2e did not complete successfully; add Slack notification here"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.tekton/pipelines/ho-release-gate.yaml around lines 112 - 118, The pipeline's finally currently only handles the success path for the create-release finalizer (referencing the create-release entry and the run-e2e task via $(tasks.run-e2e.results.result)); add a complementary finalizer (e.g., name: notify-failure) that triggers when run-e2e did not pass by using a when clause such as input: $(tasks.run-e2e.results.result) operator: notin values: ["passed"] and also add a guard on $(tasks.run-e2e.status) to catch missing results/errors (e.g., operator: in values: ["Failed","Error"] or similar), and implement the notification taskSpec for failure/error handling; ensure both create-release and notify-failure entries live under finally so failures are explicitly handled.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.tekton/pipelines/ho-release-gate.yaml:
- Around line 36-39: The pipeline currently validates params.ho-image but does
not check params.snapshot-name, so add an early non-empty validation for
snapshot-name in the same extract-image validation block: detect if
"$(params.snapshot-name)" is empty, emit a clear error like "ERROR:
snapshot-name parameter is empty" and exit 1 to fail fast; locate the validation
near the existing ho-image check in the extract-image task/script and mirror the
same pattern to ensure bad input fails before e2e runs.
---
Duplicate comments:
In @.tekton/pipelines/ho-release-gate.yaml:
- Around line 112-118: The pipeline's finally currently only handles the success
path for the create-release finalizer (referencing the create-release entry and
the run-e2e task via $(tasks.run-e2e.results.result)); add a complementary
finalizer (e.g., name: notify-failure) that triggers when run-e2e did not pass
by using a when clause such as input: $(tasks.run-e2e.results.result) operator:
notin values: ["passed"] and also add a guard on $(tasks.run-e2e.status) to
catch missing results/errors (e.g., operator: in values: ["Failed","Error"] or
similar), and implement the notification taskSpec for failure/error handling;
ensure both create-release and notify-failure entries live under finally so
failures are explicitly handled.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: abc02483-34a2-44b5-865b-99e67bdfd6e4
📒 Files selected for processing (1)
.tekton/pipelines/ho-release-gate.yaml
| if [[ -z "$(params.ho-image)" ]]; then | ||
| echo "ERROR: ho-image parameter is empty" | ||
| exit 1 | ||
| fi |
There was a problem hiding this comment.
Validate snapshot-name early in extract-image.
Line 36 validates ho-image, but snapshot-name is passed through unchecked and only consumed much later at release creation. Add a non-empty check here so bad input fails fast before e2e execution.
💡 Minimal fix
echo "Snapshot: $(params.snapshot-name)"
echo "HO image: $(params.ho-image)"
+ if [[ -z "$(params.snapshot-name)" ]]; then
+ echo "ERROR: snapshot-name parameter is empty"
+ exit 1
+ fi
+
if [[ -z "$(params.ho-image)" ]]; then
echo "ERROR: ho-image parameter is empty"
exit 1
fi📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if [[ -z "$(params.ho-image)" ]]; then | |
| echo "ERROR: ho-image parameter is empty" | |
| exit 1 | |
| fi | |
| if [[ -z "$(params.snapshot-name)" ]]; then | |
| echo "ERROR: snapshot-name parameter is empty" | |
| exit 1 | |
| fi | |
| if [[ -z "$(params.ho-image)" ]]; then | |
| echo "ERROR: ho-image parameter is empty" | |
| exit 1 | |
| fi |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.tekton/pipelines/ho-release-gate.yaml around lines 36 - 39, The pipeline
currently validates params.ho-image but does not check params.snapshot-name, so
add an early non-empty validation for snapshot-name in the same extract-image
validation block: detect if "$(params.snapshot-name)" is empty, emit a clear
error like "ERROR: snapshot-name parameter is empty" and exit 1 to fail fast;
locate the validation near the existing ho-image check in the extract-image
task/script and mirror the same pattern to ensure bad input fails before e2e
runs.
Replace POC placeholder with real gangway API calls: - Trigger periodic Prow job via POST to gangway API - Poll for job completion (SUCCESS/FAILURE/ABORTED/ERROR) - Return pass/fail result for release gating decision HO image override is not yet implemented (pending ci-operator dependency override investigation). Currently triggers the job without image override to validate the trigger/poll flow.
- Increase poll interval from 2min to 10min (jobs take 1-2h) - Filter tests to TestCreateCluster via MULTISTAGE_PARAM_OVERRIDE - Log HTTP status codes on trigger and poll requests - Distinguish test failure from polling errors (passed/failed/error) - Detect auth token expiration during polling
Use ci-operator's OVERRIDE_IMAGE_* convention to inject the Konflux-built HO image into the Prow job, replacing the default pipeline-built image.
Add OVERRIDE_IMAGE_HYPERSHIFT_TESTS to inject the test binary image, currently pointing to registry.ci.openshift.org/hypershift/hypershift-tests:latest as placeholder until commit-specific resolution is implemented.
Without explicit timeout, the task inherits the cluster default (1-2h) which is insufficient for Prow e2e jobs that take 1.5-2.5h plus gangway polling overhead.
Use periodic-ci-openshift-hypershift-release-5.0-periodics-e2e-aks instead of e2e-aws-ovn — AKS tests are directly relevant to the ARO HCP pilot platform.
- extract-image: log commit SHA from Snapshot for traceability - run-e2e: structured logging with timestamps, poll count, HTTP codes, override details, and completion times on success/failure - create-release: log Release name, Snapshot, and ReleasePlan
- Validate trigger response is valid JSON before parsing - Validate poll response is valid JSON before parsing - Log raw response on HTTP errors, JSON parse failures - Log full JSON response on SUCCESS/FAILURE for traceability
|
I now have the complete root cause. The gitlint rule Test Failure Analysis CompleteJob Information
Test Failure AnalysisErrorSummaryThe Root CauseThe commit There is no blank line followed by a body paragraph. Gitlint's B6 rule (enabled by default) mandates that every commit message include a body section. The project's The Recommendations
Evidence
|
Summary
Pipeline flow
Current state (POC)
push-snapshot-to-quay)ReleasePlanAdmissionStatus
Draft — POC placeholders in place:
run-e2e: simulated pass (gangway integration TBD)create-release: functional, creates Release referencing tenant ReleasePlannotify-slack: webhook Secret not deployed yetRelated
Summary by CodeRabbit