Skip to content

feat(ci): add ho-release-gate pipeline for nightly promotion#8602

Draft
Nirshal wants to merge 10 commits into
openshift:mainfrom
Nirshal:ho-release-gate-pipeline
Draft

feat(ci): add ho-release-gate pipeline for nightly promotion#8602
Nirshal wants to merge 10 commits into
openshift:mainfrom
Nirshal:ho-release-gate-pipeline

Conversation

@Nirshal
Copy link
Copy Markdown
Contributor

@Nirshal Nirshal commented May 27, 2026

Summary

  • Adds the Tekton pipeline definition for the HyperShift Operator release gating pipeline
  • Part of CNTRLPLANE-3434 / OCPSTRAT-3250
  • Pipeline validates HO images via e2e tests before promoting them to a verified Quay repo
  • ARO HCP is the pilot platform

Pipeline flow

CronJob (nightly 3:15 UTC) → Resolve latest Snapshot → PipelineRun:
  extract-image → run-e2e → Pass? → create-release (Konflux Release → tenant pipeline)
                             Fail? → notify-slack

Current state (POC)

Item POC Final
Release mechanism Tenant pipeline (push-snapshot-to-quay) Managed pipeline via ReleasePlanAdmission
E2e tests Simulated pass (placeholder) Prow gangway API
Target repo POC Quay repo Official verified repo
Pipeline source Fork branch Upstream main

Status

Draft — POC placeholders in place:

  • run-e2e: simulated pass (gangway integration TBD)
  • create-release: functional, creates Release referencing tenant ReleasePlan
  • notify-slack: webhook Secret not deployed yet

Related

Summary by CodeRabbit

  • New Features
    • Added an automated release-gating pipeline that validates provided image and snapshot up front, runs end-to-end checks, and creates a nightly Release record when tests pass.
  • Bug Fixes / Behavior Changes
    • Removed Slack notification behavior from this workflow; no notifications are sent on test failure.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 27, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Defines the ho-release-gate Tekton Pipeline with two params: snapshot-name and ho-image. extract-image validates ho-image and emits ho-image and snapshot-name as task results. run-e2e consumes the emitted ho-image, writes a placeholder result of "passed" and a fixed prow-job-url. In finally, create-release runs only when run-e2e's result is "passed" and uses oc create to make a Konflux Release for the provided snapshot.

Sequence Diagram(s)

sequenceDiagram
  participant ComponentA
  participant ComponentB
  ComponentA->>ComponentB: observable interaction
Loading
🚥 Pre-merge checks | ✅ 11
✅ Passed checks (11 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding a Tekton pipeline for HyperShift Operator release gating in a nightly promotion workflow.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed PR modifies only .tekton/pipelines/ho-release-gate.yaml (Tekton Pipeline YAML), not Go test files. The check for stable Ginkgo test names does not apply to infrastructure configuration files.
Test Structure And Quality ✅ Passed This PR only adds a Tekton pipeline YAML file (.tekton/pipelines/ho-release-gate.yaml); no Ginkgo tests or Go test code are included, making the test code quality check not applicable.
Topology-Aware Scheduling Compatibility ✅ Passed PR adds a Tekton Pipeline definition, not deployment manifests or operator code. Contains no scheduling constraints, so topology-aware check is not applicable.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed PR adds only a Tekton pipeline YAML definition (.tekton/pipelines/ho-release-gate.yaml), not Ginkgo e2e tests. Custom check applies only to new Ginkgo tests, so it is not applicable here.
No-Weak-Crypto ✅ Passed The PR adds only a Tekton Pipeline YAML file with no weak cryptographic algorithms, custom crypto implementations, or insecure secret comparisons in active code.
Container-Privileges ✅ Passed The ho-release-gate.yaml pipeline contains no privileged container configurations: no privileged: true, hostPID/Network/IPC, SYS_ADMIN, allowPrivilegeEscalation, or root user specs.
No-Sensitive-Data-In-Logs ✅ Passed No sensitive data (passwords, tokens, API keys, PII, session IDs, or hostnames) is logged in active code paths; all echo statements output non-sensitive identifiers and status messages.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/needs-area labels May 27, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 27, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 27, 2026

Please specify an area label

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
.tekton/pipelines/ho-release-gate.yaml (2)

91-102: ⚡ Quick win

Consider adding a timeout to the polling loop.

The commented implementation polls indefinitely until success/failure. If the Prow job gets stuck or the API becomes unreachable, this could cause the pipeline to hang forever.

When implementing the actual gangway integration, add a maximum retry count or deadline:

♻️ Suggested pattern for timeout
+          MAX_ATTEMPTS=180  # 3 hours at 60s intervals
+          ATTEMPT=0
           # Poll for completion:
           while true; do
+            ATTEMPT=$((ATTEMPT + 1))
+            if [[ ${ATTEMPT} -gt ${MAX_ATTEMPTS} ]]; then
+              echo "ERROR: Timeout waiting for Prow job completion"
+              echo -n "failed" > $(results.result.path)
+              break
+            fi
             STATUS=$(curl -s "${GANGWAY_URL}/v1/executions/${JOB_URL}" \
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.tekton/pipelines/ho-release-gate.yaml around lines 91 - 102, The commented
polling loop for checking Gangway job status (using GANGWAY_URL, JOB_URL,
GANGWAY_TOKEN and writing to results.result.path) can hang indefinitely; update
the loop to enforce a timeout by adding either a max retry counter or a deadline
variable (e.g., MAX_RETRIES or GANGWAY_POLL_DEADLINE_SECONDS) and break with a
failure result when exceeded; ensure the loop increments the counter or checks
the deadline each iteration, logs a clear timeout error, and writes "failed" to
results.result.path if the timeout is reached.

29-29: 💤 Low value

Consider pinning container image versions for reproducibility.

Multiple tasks use :latest tags (lines 29, 62, 124, 164). For a release gate pipeline, unexpected image updates could cause inconsistent behavior or breakages. Pin to specific digests or version tags before removing the draft status.

♻️ Example with pinned versions
-        image: registry.redhat.io/openshift4/ose-cli:latest
+        image: registry.redhat.io/openshift4/ose-cli:v4.15

Or use digest for stronger guarantees:

image: registry.redhat.io/openshift4/ose-cli@sha256:<digest>
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.tekton/pipelines/ho-release-gate.yaml at line 29, Replace occurrences of
the image field using the :latest tag (e.g.,
"registry.redhat.io/openshift4/ose-cli:latest") with explicit, pinned version
tags or immutable digests (e.g., "`@sha256`:...") to ensure reproducible builds;
update every task that references the same image (the other occurrences of the
same "image: registry.redhat.io/openshift4/ose-cli:latest" in this pipeline) to
the chosen tag/digest and verify compatibility before merging.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.tekton/pipelines/ho-release-gate.yaml:
- Around line 112-153: The pipeline currently only handles explicit
$(tasks.run-e2e.results.result) values "passed" or "failed" so errors/skips
produce no notification; add a catch-all task (e.g., notify-error) or extend
notify-slack to inspect $(tasks.run-e2e.status) so non-Succeeded statuses
trigger a notification. Specifically, add a finally task (name: notify-error)
using when: input: $(tasks.run-e2e.status) operator: notin values: ["Succeeded"]
(and/or guard with $(tasks.run-e2e.results.result) notin ["passed","failed"])
that sends an alert, or update the existing notify-slack when clause to include
$(tasks.run-e2e.status) operator: in values: ["Failed"] so task
errors/timeouts/omissions are reported.
- Around line 165-171: The Slack JSON payload is built by interpolating params
directly in the shell script (the script block that posts to
"${SLACK_WEBHOOK_URL}"), which risks JSON injection if params like
$(params.ho-image), $(params.snapshot-name) or $(params.prow-job-url) contain
quotes or newlines; fix by constructing the JSON safely with a JSON tool (e.g.,
use jq -n --arg snapshot "$(params.snapshot-name)" --arg image
"$(params.ho-image)" --arg prow "$(params.prow-job-url)" '{text: "HyperShift
nightly promotion FAILED\nSnapshot: \($snapshot)\nImage: \($image)\nProw job:
\($prow)\nPipeline: $(context.pipelineRun.name)"}' ) so each param is properly
escaped and then pipe that output to curl instead of embedding parameters
directly in the here-doc.

---

Nitpick comments:
In @.tekton/pipelines/ho-release-gate.yaml:
- Around line 91-102: The commented polling loop for checking Gangway job status
(using GANGWAY_URL, JOB_URL, GANGWAY_TOKEN and writing to results.result.path)
can hang indefinitely; update the loop to enforce a timeout by adding either a
max retry counter or a deadline variable (e.g., MAX_RETRIES or
GANGWAY_POLL_DEADLINE_SECONDS) and break with a failure result when exceeded;
ensure the loop increments the counter or checks the deadline each iteration,
logs a clear timeout error, and writes "failed" to results.result.path if the
timeout is reached.
- Line 29: Replace occurrences of the image field using the :latest tag (e.g.,
"registry.redhat.io/openshift4/ose-cli:latest") with explicit, pinned version
tags or immutable digests (e.g., "`@sha256`:...") to ensure reproducible builds;
update every task that references the same image (the other occurrences of the
same "image: registry.redhat.io/openshift4/ose-cli:latest" in this pipeline) to
the chosen tag/digest and verify compatibility before merging.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: bd039c9f-b76d-4301-b332-b5f6622f4d01

📥 Commits

Reviewing files that changed from the base of the PR and between 09c7701 and 56fc5a7.

📒 Files selected for processing (1)
  • .tekton/pipelines/ho-release-gate.yaml

Comment thread .tekton/pipelines/ho-release-gate.yaml Outdated
Comment on lines +112 to +153
finally:
- name: create-release
when:
- input: $(tasks.run-e2e.results.result)
operator: in
values: ["passed"]
taskSpec:
params:
- name: snapshot-name
type: string
steps:
- name: create
image: registry.redhat.io/openshift4/ose-cli:latest
script: |
#!/bin/bash
set -euo pipefail

SNAPSHOT="$(params.snapshot-name)"
echo "Tests passed. Creating Release for snapshot: ${SNAPSHOT}"

oc create -f - <<EOF
apiVersion: appstudio.redhat.com/v1alpha1
kind: Release
metadata:
generateName: hypershift-operator-ho-release-gate-aro-hcp-
namespace: crt-redhat-acm-tenant
spec:
snapshot: ${SNAPSHOT}
releasePlan: hypershift-operator-ho-release-gate-aro-hcp
gracePeriodDays: 7
EOF

echo "Release created successfully"
params:
- name: snapshot-name
value: $(tasks.extract-image.results.snapshot-name)

- name: notify-slack
when:
- input: $(tasks.run-e2e.results.result)
operator: in
values: ["failed"]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

No notification when run-e2e task errors or is skipped.

The when expressions only match explicit "passed" or "failed" results. If run-e2e errors before writing to $(results.result.path) (e.g., script crash, timeout, OOM), the result will be empty and neither create-release nor notify-slack will execute. The pipeline will complete silently with no indication of failure.

Consider adding a catch-all notification task or using $(tasks.run-e2e.status) to detect task failures:

🛡️ Suggested approach: Add error notification task
  - name: notify-error
    when:
    - input: $(tasks.run-e2e.status)
      operator: notin
      values: ["Succeeded"]
    - input: $(tasks.run-e2e.results.result)
      operator: notin
      values: ["passed", "failed"]
    taskSpec:
      steps:
      - name: notify
        image: registry.redhat.io/openshift4/ose-cli:latest
        script: |
          #!/bin/bash
          # Send notification about task error/skip
          echo "run-e2e task did not complete normally"
          # Add Slack notification here

Alternatively, modify notify-slack to also trigger on task failure status:

    when:
    - input: $(tasks.run-e2e.status)
      operator: in
      values: ["Failed"]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.tekton/pipelines/ho-release-gate.yaml around lines 112 - 153, The pipeline
currently only handles explicit $(tasks.run-e2e.results.result) values "passed"
or "failed" so errors/skips produce no notification; add a catch-all task (e.g.,
notify-error) or extend notify-slack to inspect $(tasks.run-e2e.status) so
non-Succeeded statuses trigger a notification. Specifically, add a finally task
(name: notify-error) using when: input: $(tasks.run-e2e.status) operator: notin
values: ["Succeeded"] (and/or guard with $(tasks.run-e2e.results.result) notin
["passed","failed"]) that sends an alert, or update the existing notify-slack
when clause to include $(tasks.run-e2e.status) operator: in values: ["Failed"]
so task errors/timeouts/omissions are reported.

Comment thread .tekton/pipelines/ho-release-gate.yaml Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented May 27, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 41.43%. Comparing base (2f52041) to head (fd29322).
⚠️ Report is 146 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8602      +/-   ##
==========================================
+ Coverage   40.61%   41.43%   +0.81%     
==========================================
  Files         755      756       +1     
  Lines       93227    93647     +420     
==========================================
+ Hits        37864    38802     +938     
+ Misses      52640    52124     -516     
+ Partials     2723     2721       -2     

see 52 files with indirect coverage changes

Flag Coverage Δ
cmd-support 34.87% <ø> (+0.17%) ⬆️
cpo-hostedcontrolplane 43.50% <ø> (+1.72%) ⬆️
cpo-other 42.74% <ø> (+1.67%) ⬆️
hypershift-operator 51.57% <ø> (+0.81%) ⬆️
other 31.64% <ø> (+0.05%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 27, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Nirshal
Once this PR has been reviewed and has the lgtm label, please assign cblecker for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
.tekton/pipelines/ho-release-gate.yaml (1)

230-234: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Harden Slack webhook call with fail-fast and timeout controls.

The webhook POST can currently fail silently (non-2xx) or hang without bounds. Add curl failure/timeout/retry flags so pipeline outcome reflects notification delivery failures.

Proposed patch
       - name: send-notification
         image: curlimages/curl:latest
         script: |
           #!/bin/sh
-          curl -X POST -H 'Content-type: application/json' \
+          curl --fail --show-error --silent \
+            --connect-timeout 10 --max-time 30 \
+            --retry 3 --retry-delay 2 \
+            -X POST -H 'Content-type: application/json' \
             --data "{
               \"text\": \"HyperShift nightly promotion FAILED\nSnapshot: $(params.snapshot-name)\nImage: $(params.ho-image)\nProw job: $(params.prow-job-url)\nPipeline: $(context.pipelineRun.name)\"
             }" \
             "${SLACK_WEBHOOK_URL}"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.tekton/pipelines/ho-release-gate.yaml around lines 230 - 234, Update the
curl invocation that posts to "${SLACK_WEBHOOK_URL}" (the Slack webhook step) to
use robust failure, timeout, and retry flags so non-2xx responses and hangs
produce a non-zero exit: add --fail --show-error --connect-timeout 5 --max-time
10 --retry 3 --retry-delay 2 --retry-connrefused to the existing curl command
that posts the JSON payload (the block using params.snapshot-name,
params.ho-image, params.prow-job-url and context.pipelineRun.name) so the
pipeline reflects notification delivery failures.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In @.tekton/pipelines/ho-release-gate.yaml:
- Around line 230-234: Update the curl invocation that posts to
"${SLACK_WEBHOOK_URL}" (the Slack webhook step) to use robust failure, timeout,
and retry flags so non-2xx responses and hangs produce a non-zero exit: add
--fail --show-error --connect-timeout 5 --max-time 10 --retry 3 --retry-delay 2
--retry-connrefused to the existing curl command that posts the JSON payload
(the block using params.snapshot-name, params.ho-image, params.prow-job-url and
context.pipelineRun.name) so the pipeline reflects notification delivery
failures.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 0389e56d-fe3e-4fce-a4a9-7608f8e44e1b

📥 Commits

Reviewing files that changed from the base of the PR and between 56fc5a7 and d079b12.

📒 Files selected for processing (1)
  • .tekton/pipelines/ho-release-gate.yaml

@Nirshal Nirshal force-pushed the ho-release-gate-pipeline branch 2 times, most recently from e79809d to 58a3243 Compare May 28, 2026 15:45
@Nirshal Nirshal changed the title WIP: Add ho-release-gate pipeline for nightly promotion feat(ci): add ho-release-gate pipeline for nightly promotion May 28, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 28, 2026

Actionable comments posted: 0

Adds a Tekton pipeline definition for the HyperShift Operator release
gating pipeline (CNTRLPLANE-3434). The pipeline validates HO images
via e2e tests before promoting them to a verified Quay repository.

Pipeline flow: extract-image → run-e2e → push-image (skopeo) / notify-slack
ARO HCP is the pilot platform.

Current POC state:
- run-e2e: simulated pass (gangway integration TBD)
- push-image: direct skopeo copy to POC Quay repo
- create-release (Konflux Release object) kept commented for final version

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Nirshal Nirshal force-pushed the ho-release-gate-pipeline branch from 58a3243 to bfb44b0 Compare May 28, 2026 16:31
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
.tekton/pipelines/ho-release-gate.yaml (1)

112-118: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add a non-success notification path in finally.

Only the pass path is handled today. If run-e2e returns failed or errors before publishing results, there is no explicit failure notification task in this pipeline. Add a failure/error finalizer path keyed off task status/results.

💡 Minimal fix pattern
   finally:
   - name: create-release
     when:
     - input: $(tasks.run-e2e.results.result)
       operator: in
       values: ["passed"]
@@
     - name: snapshot-name
       value: $(tasks.extract-image.results.snapshot-name)
+
+  - name: notify-failure
+    when:
+    - input: $(tasks.run-e2e.status)
+      operator: notin
+      values: ["Succeeded"]
+    taskSpec:
+      steps:
+      - name: notify
+        image: registry.redhat.io/openshift4/ose-cli:latest
+        script: |
+          #!/bin/bash
+          set -euo pipefail
+          echo "run-e2e did not complete successfully; add Slack notification here"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.tekton/pipelines/ho-release-gate.yaml around lines 112 - 118, The
pipeline's finally currently only handles the success path for the
create-release finalizer (referencing the create-release entry and the run-e2e
task via $(tasks.run-e2e.results.result)); add a complementary finalizer (e.g.,
name: notify-failure) that triggers when run-e2e did not pass by using a when
clause such as input: $(tasks.run-e2e.results.result) operator: notin values:
["passed"] and also add a guard on $(tasks.run-e2e.status) to catch missing
results/errors (e.g., operator: in values: ["Failed","Error"] or similar), and
implement the notification taskSpec for failure/error handling; ensure both
create-release and notify-failure entries live under finally so failures are
explicitly handled.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.tekton/pipelines/ho-release-gate.yaml:
- Around line 36-39: The pipeline currently validates params.ho-image but does
not check params.snapshot-name, so add an early non-empty validation for
snapshot-name in the same extract-image validation block: detect if
"$(params.snapshot-name)" is empty, emit a clear error like "ERROR:
snapshot-name parameter is empty" and exit 1 to fail fast; locate the validation
near the existing ho-image check in the extract-image task/script and mirror the
same pattern to ensure bad input fails before e2e runs.

---

Duplicate comments:
In @.tekton/pipelines/ho-release-gate.yaml:
- Around line 112-118: The pipeline's finally currently only handles the success
path for the create-release finalizer (referencing the create-release entry and
the run-e2e task via $(tasks.run-e2e.results.result)); add a complementary
finalizer (e.g., name: notify-failure) that triggers when run-e2e did not pass
by using a when clause such as input: $(tasks.run-e2e.results.result) operator:
notin values: ["passed"] and also add a guard on $(tasks.run-e2e.status) to
catch missing results/errors (e.g., operator: in values: ["Failed","Error"] or
similar), and implement the notification taskSpec for failure/error handling;
ensure both create-release and notify-failure entries live under finally so
failures are explicitly handled.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: abc02483-34a2-44b5-865b-99e67bdfd6e4

📥 Commits

Reviewing files that changed from the base of the PR and between 58a3243 and bfb44b0.

📒 Files selected for processing (1)
  • .tekton/pipelines/ho-release-gate.yaml

Comment on lines +36 to +39
if [[ -z "$(params.ho-image)" ]]; then
echo "ERROR: ho-image parameter is empty"
exit 1
fi
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate snapshot-name early in extract-image.

Line 36 validates ho-image, but snapshot-name is passed through unchecked and only consumed much later at release creation. Add a non-empty check here so bad input fails fast before e2e execution.

💡 Minimal fix
           echo "Snapshot: $(params.snapshot-name)"
           echo "HO image: $(params.ho-image)"
 
+          if [[ -z "$(params.snapshot-name)" ]]; then
+            echo "ERROR: snapshot-name parameter is empty"
+            exit 1
+          fi
+
           if [[ -z "$(params.ho-image)" ]]; then
             echo "ERROR: ho-image parameter is empty"
             exit 1
           fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if [[ -z "$(params.ho-image)" ]]; then
echo "ERROR: ho-image parameter is empty"
exit 1
fi
if [[ -z "$(params.snapshot-name)" ]]; then
echo "ERROR: snapshot-name parameter is empty"
exit 1
fi
if [[ -z "$(params.ho-image)" ]]; then
echo "ERROR: ho-image parameter is empty"
exit 1
fi
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.tekton/pipelines/ho-release-gate.yaml around lines 36 - 39, The pipeline
currently validates params.ho-image but does not check params.snapshot-name, so
add an early non-empty validation for snapshot-name in the same extract-image
validation block: detect if "$(params.snapshot-name)" is empty, emit a clear
error like "ERROR: snapshot-name parameter is empty" and exit 1 to fail fast;
locate the validation near the existing ho-image check in the extract-image
task/script and mirror the same pattern to ensure bad input fails before e2e
runs.

Nirshal added 9 commits June 3, 2026 17:24
Replace POC placeholder with real gangway API calls:
- Trigger periodic Prow job via POST to gangway API
- Poll for job completion (SUCCESS/FAILURE/ABORTED/ERROR)
- Return pass/fail result for release gating decision

HO image override is not yet implemented (pending ci-operator
dependency override investigation). Currently triggers the job
without image override to validate the trigger/poll flow.
- Increase poll interval from 2min to 10min (jobs take 1-2h)
- Filter tests to TestCreateCluster via MULTISTAGE_PARAM_OVERRIDE
- Log HTTP status codes on trigger and poll requests
- Distinguish test failure from polling errors (passed/failed/error)
- Detect auth token expiration during polling
Use ci-operator's OVERRIDE_IMAGE_* convention to inject the Konflux-built
HO image into the Prow job, replacing the default pipeline-built image.
Add OVERRIDE_IMAGE_HYPERSHIFT_TESTS to inject the test binary image,
currently pointing to registry.ci.openshift.org/hypershift/hypershift-tests:latest
as placeholder until commit-specific resolution is implemented.
Without explicit timeout, the task inherits the cluster default (1-2h)
which is insufficient for Prow e2e jobs that take 1.5-2.5h plus
gangway polling overhead.
Use periodic-ci-openshift-hypershift-release-5.0-periodics-e2e-aks
instead of e2e-aws-ovn — AKS tests are directly relevant to the
ARO HCP pilot platform.
- extract-image: log commit SHA from Snapshot for traceability
- run-e2e: structured logging with timestamps, poll count, HTTP codes,
  override details, and completion times on success/failure
- create-release: log Release name, Snapshot, and ReleasePlan
- Validate trigger response is valid JSON before parsing
- Validate poll response is valid JSON before parsing
- Log raw response on HTTP errors, JSON parse failures
- Log full JSON response on SUCCESS/FAILURE for traceability
@hypershift-jira-solve-ci
Copy link
Copy Markdown

I now have the complete root cause. The gitlint rule B6 (Body message is missing) is failing because commit cea8fcda93 has only a title line (fix(ci): use oc jsonpath instead of jq in extract-image task) with no body. The .gitlint config enforces conventional commits but doesn't explicitly disable the body requirement rule, so gitlint's default B6 rule flags the missing body.

Test Failure Analysis Complete

Job Information

Test Failure Analysis

Error

Commit cea8fcda93:
3: B6 Body message is missing

make: *** [Makefile:614: run-gitlint] Error 1

Summary

The gitlint check failed on commit cea8fcda93 (fix(ci): use oc jsonpath instead of jq in extract-image task) because the commit message contains only a title line with no body text. Gitlint's built-in rule B6 ("Body message is missing") requires every commit to include a body paragraph after the title. Since the .gitlint configuration does not disable this rule, the check enforces it and exits with an error.

Root Cause

The commit cea8fcda933b has a single-line commit message:

fix(ci): use oc jsonpath instead of jq in extract-image task

There is no blank line followed by a body paragraph. Gitlint's B6 rule (enabled by default) mandates that every commit message include a body section. The project's .gitlint configuration at the repo root enables conventional-commit title validation and sets max line lengths, but it does not add ignore=B6 under [general] or disable the body requirement in any way. Therefore, any commit without a body will fail this CI check.

The run-gitlint Makefile target lints all commits in the PR range (PULL_BASE_SHA..PULL_PULL_SHA), and since this specific commit (one of 10 in the PR) violates B6, the entire job fails.

Recommendations
  1. Immediate fix — Add a body to commit cea8fcda93: Amend or interactively rebase to add a body paragraph to the commit message. For example:

    fix(ci): use oc jsonpath instead of jq in extract-image task
    
    Replace jq-based JSON parsing with oc's built-in jsonpath support
    in the extract-image Tekton task to remove the jq dependency.
    
  2. Alternative — Squash commits: If the PR will be squash-merged, consider squashing all 10 commits into a single well-formed conventional commit with a proper body. This would resolve B6 for all commits at once.

  3. Long-term — Consider disabling B6 for the project: If the team considers single-line commit messages acceptable for small changes, add ignore=B6 to the [general] section of .gitlint. This is a team-level policy decision.

Evidence
Evidence Detail
Failing commit cea8fcda933b9e92fa66063c065f0cc533538819
Commit message fix(ci): use oc jsonpath instead of jq in extract-image task (title only, no body)
Gitlint rule violated B6 — Body message is missing (line 3 of commit message)
Gitlint config .gitlint at repo root — enables contrib-title-conventional-commits, does not disable B6
Makefile target run-gitlint — runs gitlint --commits $(PULL_BASE_SHA)..$(PULL_PULL_SHA)
Commit range linted adfbcddc2e7d..fd29322e9b35 (10 commits in PR #8602)
Exit code 2 (gitlint found rule violations)
CI workflow .github/workflows/gitlint-reusable.yaml at refs/heads/main

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/needs-area do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant