OCPSTRAT-3250: Konflux release gating pipeline for HyperShift Operator by bryan-cox · Pull Request #2016 · openshift/enhancements

bryan-cox · 2026-05-19T17:26:32Z

Summary

Adds a new enhancement proposal for a nightly Konflux-based release gating pipeline that validates HyperShift Operator images against e2e test suites before promoting them to verified repositories.
The pipeline operates alongside the existing auto-release to ACMD, adding a parallel, per-platform promotion path (ARO HCP pilot, ROSA HCP and GCP HCP future).
Includes full Konflux resource definitions (CronJob, ReleasePlan, IntegrationTestScenario, RBAC, Release), e2e test pipeline structure, error handling, strategy alignment, and related Jira issue tracking.

OCPSTRAT-3250 / CNTRLPLANE-3434

Test plan

markdownlint passes cleanly
All required enhancement template headings present
YAML frontmatter validates

🤖 Generated with Claude Code

openshift-ci-robot · 2026-05-19T17:30:54Z

@bryan-cox: This pull request references OCPSTRAT-3250 which is a valid jira issue.

Details

In response to this:

Summary

Adds a new enhancement proposal for a nightly Konflux-based release gating pipeline that validates HyperShift Operator images against e2e test suites before promoting them to verified repositories.

The pipeline operates alongside the existing auto-release to ACMD, adding a parallel, per-platform promotion path (ARO HCP pilot, ROSA HCP and GCP HCP future).

Includes full Konflux resource definitions (CronJob, ReleasePlan, IntegrationTestScenario, RBAC, Release), e2e test pipeline structure, error handling, strategy alignment, and related Jira issue tracking.

OCPSTRAT-3250 / CNTRLPLANE-3434

Test plan

markdownlint passes cleanly

All required enhancement template headings present

YAML frontmatter validates

🤖 Generated with Claude Code

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

bryan-cox · 2026-05-19T17:33:13Z

+
+## Proposal
+
+A parallel, gated promotion path is added alongside the existing Konflux auto-release. A nightly pipeline resolves the latest HO Snapshot, runs e2e tests against the corresponding image, and promotes it to a verified repository only if tests pass. Each managed service platform receives its own independent promotion path with its own test suite and verified repository.


Bringing over @deads2k comment from Slack:

Add a parallel, gated promotion path alongside the existing auto-release — a nightly pipeline tests the latest HO image and only promotes it to a verified repository if e2e tests pass.

"I like this, the final solution should definitely include it"

bryan-cox · 2026-05-19T17:35:24Z

+
+## Proposal
+
+A parallel, gated promotion path is added alongside the existing Konflux auto-release. A nightly pipeline resolves the latest HO Snapshot, runs e2e tests against the corresponding image, and promotes it to a verified repository only if tests pass. Each managed service platform receives its own independent promotion path with its own test suite and verified repository.


Bringing over @deads2k comment from Slack:

Each managed service platform (ARO HCP, ROSA HCP, GCP HCP) gets its own independent promotion path, so a failure on one platform does not block others.

"I'm ok with this, but I don't see that as a hard requirement. If y'all want to take the perspective that you want them unified, I'm ok with that too."

Leaving as-is for now — we prefer independent paths but acknowledging it's not a hard requirement.

bryan-cox · 2026-05-19T17:44:43Z

+| Phase 2 | Full CPO version matrix (every supported 4.y.z and 4.y.0) |
+| Phase 3 | Platform-specific e2e (ARO HCP Azure ARM, platform QE co-authored tests) |
+
+Subsequent phases add broader version coverage and platform-specific tests co-authored with platform QE teams.


Bringing over @deads2k comment from Slack:

Subsequent phases add broader version coverage and platform-specific tests co-authored with platform QE teams.

"I'm not certain this release controller is actually coupled to specific platforms. I see this release controller as encapsulating and automating hypershift's promise to platforms of phase 1 and phase 2 as you've laid them out. Keeping it at that level, plus informing per-platform would leave accountability and responsibility for failing promotion extremely clear."

Updated the phase table to add ownership. Phases 1 and 2 are marked as required for completion, owned by HCP team. Phase 3 is reframed as informing jobs owned by platform teams — the release controller's responsibility ends with demonstrating it's possible to create such a job.

bryan-cox · 2026-05-19T17:47:20Z

+| ----- | -------- |
+| Phase 1 (MVP) | Cluster lifecycle, NodePool scaling, one upgrade path |
+| Phase 2 | Full CPO version matrix (every supported 4.y.z and 4.y.0) |
+| Phase 3 | Platform-specific e2e (ARO HCP Azure ARM, platform QE co-authored tests) |


Bringing over @deads2k comment from Slack:

David: Let's play out your phase3-platform specific jobs. Who would watch and how would we decide about responsiblity
David: well maybe back up to phase 2. Do you agree with phase 1 and phase 2 only, it's very clear that HCP owns "we haven't promoted a release, we must fix"
David: and that when we introduce phase 3, that becomes muddier, "it hasn't passed phase 3, but it's ARO-HCP's fault" (similar to our frequent failures with ROSA release-blocking jobs?
Bryan: re:phase 1 & 2 - yeah that seems reasonable to me.
Bryan: phase 3 - Agree it's not as clear. I think it would be a joint or shared responsibility between the teams to figure out why the tests are failing and how to resolve that.
David: can we make that explicit for phase 1 and phase 2, indicate that they are critical for considering this complete. and add the concept of informing jobs that would include phase 3, with the responsibility lying with platform teams for creating and watching their signal. The release controller responsibility ends with demonstrating it is possible to create such a job.

Addressed in the same update as above. Phase 1-2 ownership is explicit. Phase 3 notes that HCP team can help debug and fix failing tests in coordination with platform teams.

bryan-cox · 2026-05-19T18:08:46Z

While we are implementing this effort for ARO HCP first, we are expecting to onboard ROSA HCP and GCP in the future. I wanted to make sure y'all were aware of this enhancement; please feel free to unsubscribe if you wish - @deads2k @joshbranham @cblecker

enxebre · 2026-05-25T11:05:02Z

+
+## Summary
+
+This enhancement introduces a nightly, platform-independent gating system that validates HyperShift Operator (HO) images against end-to-end test suites before promoting them to verified repositories. The pipeline operates alongside the existing Konflux auto-release mechanism, adding a parallel promotion path that only advances images which have passed real-world e2e validation.


can we define "real-world e2e validation"? is this specific consumer owned e2e test suites / gates?

Updated — clarified that "e2e validation" means test suites agreed upon between the HyperShift and managed service (HCM) teams. Tests may vary by platform.

AI-assisted response via Claude Code

enxebre · 2026-05-25T11:06:34Z

+3. Keep the existing auto-release to ACMD completely unchanged; the new pipeline is purely additive.
+4. Enable independent promotion paths per platform so that one platform's failure does not block others.
+5. Make the pipeline extensible to new platforms with only new Konflux resource definitions and no pipeline code changes.
+


is there any goal for per platform speed / granularity to ship? Why was 24h chosen?

Is there any goal / non goal for alerting and/or troubleshooting failed pipelines?

24h (nightly) was chosen to balance validation confidence with cloud infrastructure cost — each run provisions real clusters with cloud credentials. Per-commit gating is addressed in the Alternatives section: it's cost-prohibitive and would slow the development feedback loop. Per-platform cadence can differ if a platform team wants more frequent runs — each platform can have its own CronJob schedule (noted in the Platform Extensibility section).

AI-assisted response via Claude Code

Alerting and troubleshooting are covered in a few places: the Error Handling table (every failure type triggers a Slack alert), the Stale Promotion Alert section (alerts if no successful promotion in N days, default 3), and the Support Procedures section (detection commands + remediation steps including manual re-trigger). These are tracked in CNTRLPLANE-3451 (stale alerting) and CNTRLPLANE-3450 (manual re-trigger). Let me know if you'd like more detail or if something specific is missing.

AI-assisted response via Claude Code

can we capture these responses in the proposal?

Done. Added a "Design Rationale" subsection after Non-Goals covering the nightly cadence choice and alerting/troubleshooting coverage, as discussed in earlier thread comments.

AI-assisted response via Claude Code

enxebre · 2026-05-25T11:09:23Z

+
+## Proposal
+
+Add a parallel, gated promotion path alongside the existing auto-release. A nightly pipeline tests the latest HO image against platform-specific e2e suites and only promotes tested images to a verified repository. Each platform's promotion is independent — a failure on one does not block others.


A nightly pipeline tests the latest HO image

can we clarify what is this "latest HO image", e.g. who/how builds it?

Clarified — "latest HO image" is the most recent image produced by Konflux's push build pipeline, triggered on every merge to main. Updated the Proposal paragraph to make this explicit.

AI-assisted response via Claude Code

enxebre · 2026-05-25T11:11:37Z

+1. **Trigger:** A Kubernetes CronJob in the `crt-redhat-acm-tenant` namespace runs nightly.
+2. **Resolve:** The CronJob queries Konflux Snapshots labeled with the push build's PipelineRun name, selects the most recent, and extracts the HO container image reference.
+3. **Launch:** The CronJob creates a Tekton `PipelineRun` referencing the e2e test pipeline (`.tekton/pipelines/ho-release-gate.yaml`), passing the snapshot name and HO image as parameters.
+4. **Test:** The pipeline launches Prow jobs that deploy the resolved HO image and run HyperShift e2e tests against it. Konflux orchestrates the run and consumes pass/fail results and links.


can we articulate how this happens per platform?

Updated step 4 to articulate the per-platform mechanism: each platform has its own IntegrationTestScenario defining the test suite and target infrastructure (e.g., Azure for ARO HCP, AWS for ROSA HCP). Konflux orchestrates the run and consumes pass/fail results.

AI-assisted response via Claude Code

enxebre · 2026-05-25T11:16:09Z

+
+#### ReleasePlan (per-platform)
+
+A per-platform resource. The YAML below shows the ARO HCP pilot instance. Future platforms (ROSA HCP, GCP HCP) will each get their own ReleasePlan. All platforms push to the same verified repository, tagged differently per managed service. Auto-release is disabled (`auto-release: 'false'`), meaning images only reach the verified repo through explicit Release objects created after tests pass.


who creates this?

Done. Added explicit ownership — these resources are created by the HCP team in the `crt-redhat-acm-tenant` namespace.

AI-assisted response via Claude Code

enxebre · 2026-05-25T11:16:16Z

+
+#### IntegrationTestScenario (per-platform)
+
+A per-platform resource. This wires the e2e test Tekton pipeline as a gate on Snapshots. It references a pipeline definition stored in the HyperShift repository, allowing the test pipeline to evolve alongside the code it validates.


who creates this?

Done. Added explicit ownership — these resources are created by the HCP team in the `crt-redhat-acm-tenant` namespace.

AI-assisted response via Claude Code

Introduces a nightly, platform-independent gating system that validates HyperShift Operator images against e2e test suites before promoting them to verified repositories. The pipeline operates alongside the existing Konflux auto-release, adding a parallel promotion path per managed service platform (ARO HCP pilot, ROSA HCP and GCP HCP future). OCPSTRAT-3250 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Nirshal · 2026-05-27T15:54:12Z

+  - name: application
+    description: HyperShift e2e tests for ARO HCP promotion gating
+```
+


Question about the interaction between the CronJob and this IntegrationTestScenario.

Looking at the existing ITS resources in crt-redhat-acm-tenant (e.g. hypershift-operator-main-enterprise-contract), they all use contexts: [{name: application}] and are triggered automatically by Konflux on every new Snapshot.

This ITS also uses contexts: [{name: application}] — wouldn't this cause Konflux to run the e2e test pipeline on every push build (i.e. every new Snapshot), rather than only on the nightly cadence the CronJob provides?

The CronJob already resolves the latest Snapshot and creates a PipelineRun directly via git resolver, bypassing the ITS entirely. So these two mechanisms seem to overlap.

Could you clarify how these are meant to interact? Specifically:

Is the ITS needed for Konflux to consider a Snapshot "valid" before allowing a Release to be created from it?

Or is the CronJob the sole trigger, and the ITS can be dropped?

Good catch — you're right that these overlap. The documented Konflux periodic test pattern (https://konflux-ci.dev/docs/testing/integration/periodic-integration-tests/) uses contexts: [{name: disabled}] on the ITS so it doesn't trigger on every push, and the CronJob triggers it by labeling the latest snapshot with test.appstudio.openshift.io/run=<scenario-name>. Updated both the ITS (now uses disabled context) and CronJob (now labels snapshots instead of creating PipelineRuns directly) to follow this pattern. Also updated the RBAC, workflow diagrams, and step descriptions to match.

AI-assisted response via Claude Code

enxebre · 2026-05-28T12:01:19Z

+
+**Konflux build pipeline** is the existing push build pipeline that creates Snapshots for every merged commit.
+
+**e2e test pipeline** is a Tekton Pipeline defined at `.tekton/pipelines/ho-release-gate.yaml` in the HyperShift repository.


should this file named after the consumer? e.g. ho-aro-release-gate.yaml? will we have one per platform?
can we include the yaml example for ARO?

The MVP uses a single pipeline file since ARO HCP is the only platform. The Konflux ITS spec.params field supports passing custom pipeline parameters, so a shared pipeline with per-platform params is viable if the task structure stays the same across platforms. If platforms need different task sequences or infrastructure setup, per-platform files (e.g., ho-aro-release-gate.yaml) would be the right call. Updated the doc to note both options with the decision deferred until a second platform is onboarded.

AI-assisted response via Claude Code

enxebre · 2026-05-28T12:01:31Z

+2. **Resolve:** The CronJob queries Konflux Snapshots labeled with the push build's PipelineRun name, selects the most recent, and extracts the HO container image reference.
+3. **Launch:** The CronJob creates a Tekton `PipelineRun` referencing the e2e test pipeline (`.tekton/pipelines/ho-release-gate.yaml`), passing the snapshot name and HO image as parameters.
+4. **Test:** The pipeline launches Prow jobs that deploy the resolved HO image and run e2e tests against it. Each platform defines its own `IntegrationTestScenario` that specifies the test suite and infrastructure — for example, ARO HCP tests run against Azure-provisioned clusters, while ROSA HCP tests would use AWS. Konflux orchestrates the run and consumes pass/fail results and links.
+5. **Promote:** On pass, the pipeline's `finally` block creates a Konflux Release object referencing the tested Snapshot and a platform-specific ReleasePlan. Konflux's release pipeline pushes the image to the verified repository.


what's the "verified repository"? Is there one per consumer? should this be in glossary?

Added "Verified Repository" to the Glossary — single shared quay.io repo with per-platform image tags (e.g., aro-hcp-<digest>, rosa-hcp-<digest>).

AI-assisted response via Claude Code

enxebre · 2026-05-28T12:01:45Z

+
+#### ReleasePlan (per-platform)
+
+A per-platform resource created by the HCP team in the `crt-redhat-acm-tenant` namespace. The YAML below shows the ARO HCP pilot instance. Future platforms (ROSA HCP, GCP HCP) will each get their own ReleasePlan. All platforms push to the same verified repository, tagged differently per managed service. Auto-release is disabled (`auto-release: 'false'`), meaning images only reach the verified repo through explicit Release objects created after tests pass.


are all these resources created manually? will this be gitoped somehow?

The pipeline definition lives in the HyperShift repo at .tekton/pipelines/, referenced by ITS via git resolver. Konflux namespace resources (ITS, ReleasePlan, CronJob, RBAC) are defined in contrib/konflux/ in the HyperShift repo and applied to the crt-redhat-acm-tenant namespace, following the same pattern used for existing Konflux config. Changes go through the standard PR review process.

AI-assisted response via Claude Code

enxebre · 2026-05-28T12:02:05Z

+    - name: revision
+      value: main
+    - name: pathInRepo
+      value: .tekton/pipelines/ho-release-gate.yaml


should this yaml have a consumer specific name?

This is the same question addressed in the pipeline naming thread — for the MVP with only ARO HCP, a single ho-release-gate.yaml is used. When additional platforms are onboarded, this may become per-consumer (e.g., ho-aro-release-gate.yaml) if test suites differ enough, or stay shared with platform-specific params via ITS spec.params. Decision deferred until a second platform is added.

AI-assisted response via Claude Code

enxebre · 2026-05-28T12:05:09Z

+A per-platform resource created by the HCP team in the `crt-redhat-acm-tenant` namespace. This wires the e2e test Tekton pipeline as a gate on Snapshots. It references a pipeline definition stored in the HyperShift repository, allowing the test pipeline to evolve alongside the code it validates.
+
+```yaml
+apiVersion: appstudio.redhat.com/v1beta2


it'd be nice a diagram showing how the cronjob, IntegrationTestScenario, ReleasePlan, ReleasePlanAdmission... CRs interact

Done. Added a CR interaction diagram in the Implementation Details section showing how CronJob, Snapshot, IntegrationTestScenario, PipelineRun, Release, ReleasePlan, and ReleasePlanAdmission relate to each other.

AI-assisted response via Claude Code

enxebre · 2026-05-28T12:06:53Z

+The nightly cadence means there is up to a 24-hour delay between a merge and its appearance in a verified repository. This is acceptable for production consumption but may require teams to continue using ACMD for rapid iteration.
+
+## Alternatives (Not Implemented)
+


do we want to include considerations to move HO into OLM? maybe beyond scope

Agreed this is beyond scope for this enhancement — the gating pipeline is delivery-mechanism-agnostic and would work regardless of whether HO is delivered via OLM or the current direct image push. If HO moves to OLM in the future, the promotion step would change (OLM bundle vs raw image push) but the test-then-promote pattern stays the same.

AI-assisted response via Claude Code

enxebre · 2026-05-28T12:07:41Z

+
+4. **Platform e2e test integration:** Bryan is working with the ARO HCP team to integrate their platform-specific e2e tests into the HyperShift repo, following the same pattern used for HyperShift's existing presubmit e2e tests.
+
+5. **Regression analysis:** deads2k raised that this release, decoupled from OCP releases, needs its own regression analysis in component readiness — comparing current HO against a sliding baseline to track the trajectory of the project. This needs further discussion to determine what that mechanism looks like and how it integrates with existing component readiness tooling.


is there ticket/anyone from ship team aware of this?

Not yet — we haven't coordinated with the SHIP team on this. The regression analysis mechanism for a release decoupled from OCP is still undefined. Adding a note here to track the need for SHIP team engagement.

AI-assisted response via Claude Code

enxebre · 2026-05-28T12:08:01Z

dropped some more questions, lgtm

- Add Design Rationale section capturing nightly cadence and alerting rationale from PR discussion threads (enxebre) - Clarify per-platform pipeline naming strategy with TBD for shared vs separate files when second platform onboards (enxebre) - Add Verified Repository to glossary as single shared quay.io repo with per-platform image tags (enxebre) - Document resource management: pipeline in .tekton/pipelines/, Konflux namespace resources in contrib/konflux/ (enxebre) - Fix CronJob/ITS interaction to follow Konflux periodic test pattern: ITS uses disabled context, CronJob labels snapshots instead of creating PipelineRuns directly (Nirshal) - Update RBAC, diagrams, and workflow steps to match new pattern Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Shows how CronJob, Snapshot, IntegrationTestScenario, PipelineRun, Release, ReleasePlan, and ReleasePlanAdmission interact during the nightly gating flow. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

openshift-ci · 2026-05-28T13:35:10Z

@bryan-cox: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

enxebre · 2026-05-28T13:39:24Z

/approve

openshift-ci · 2026-05-28T13:40:49Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: enxebre

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~enhancements/hypershift/OWNERS~~ [enxebre]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

JoelSpeed

One thing I'm hoping we solve as part of this EP is the ability to have confidence in the supported matrix of CPO to guest and CPO to management version skews. There is only a very light mention of cross version testing here, was that something you were considering in/out of scope?

JoelSpeed · 2026-05-26T17:14:56Z

+
+## Summary
+
+This enhancement introduces a nightly, platform-independent gating system that validates HyperShift Operator (HO) images against end-to-end test suites before promoting them to verified repositories. The pipeline operates alongside the existing Konflux auto-release mechanism, adding a parallel promotion path that only advances images which have passed e2e test suites agreed upon between the HyperShift and managed service (HCM) teams. Tests may vary by platform.


Promotion requires all tests pass across all platforms? Or are there separate promotion destinations such that we might see promotion succeed on ARO but not ROSA?

There will be different promotion paths for each managed service since each one needs a different set of tests to pass. If ARO HCP tests fail but GCP and ROSA tests pass, they should still get a tagged HO for their managed services respectively.

JoelSpeed · 2026-05-29T12:48:21Z

+
+#### Design Rationale
+
+**Nightly cadence (24h):** Each pipeline run provisions real cloud infrastructure with platform-specific credentials (e.g., Azure for ARO HCP). A nightly cadence balances validation confidence with cloud infrastructure cost. Per-commit gating is cost-prohibitive and would slow the development feedback loop (see Alternatives). Per-platform cadence can differ — each platform can have its own CronJob schedule.


OpenShift CI and nightly builds happen every 6h, have you considered making this more frequent than once per day? Is there enough change in a day to warrant more than once per day?

Once a day might actually be too much. Some managed services update the HO more than others but I do not think any of the them are in a place to do more than one update within a 24h period.

JoelSpeed · 2026-05-29T12:51:06Z

+
+## Proposal
+
+Add a parallel, gated promotion path alongside the existing auto-release. A nightly pipeline resolves the most recent HO image built by Konflux's push build pipeline (triggered on every merge to `main`) and tests it against platform-specific e2e suites. Only tested images are promoted to a verified repository. Each platform's promotion is independent — a failure on one does not block others.


Are there any retest mechanisms here should it fail? Or is it then a case of wait until the next day?

Having this per platform makes the concept of a "green nightly" more elusive, is tracking the failures and escalation something you plan when there are consecutive failures?

It's wait until the next day but retest is something we plan to follow up on later. It was not seen as a must have for a MVP.

JoelSpeed · 2026-05-29T13:04:01Z

+| Phase | Coverage | Ownership |
+| ----- | -------- | --------- |
+| Phase 1 (MVP) | Cluster lifecycle, NodePool scaling, one upgrade path | HCP team — required for completion |
+| Phase 2 | Full CPO version matrix (every supported 4.y.z and 4.y.0) | HCP team — required for completion |


What does this actually mean? Is this "run CPO on lots of 4.Y management clusters" or "CPO can create lots of 4.Y workload clusters"

There is some CPO testing being done outside this effort but those tests will be included in the promotion process of the image. @clebs could point you to that effort.

openshift-ci Bot requested review from csrwng and enxebre May 19, 2026 17:26

bryan-cox force-pushed the OCPSTRAT-3250 branch from 68c16fa to a12fcf2 Compare May 19, 2026 17:30

bryan-cox changed the title ~~Enhancement: Konflux release gating pipeline for HyperShift Operator~~ OCPSTRAT-3250: Enhancement: Konflux release gating pipeline for HyperShift Operator May 19, 2026

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 19, 2026

bryan-cox changed the title ~~OCPSTRAT-3250: Enhancement: Konflux release gating pipeline for HyperShift Operator~~ OCPSTRAT-3250: Konflux release gating pipeline for HyperShift Operator May 19, 2026

bryan-cox commented May 19, 2026

View reviewed changes

bryan-cox force-pushed the OCPSTRAT-3250 branch from a12fcf2 to 610136a Compare May 19, 2026 17:42

bryan-cox commented May 19, 2026

View reviewed changes

Comment thread enhancements/hypershift/konflux-release-gating-pipeline.md

bryan-cox force-pushed the OCPSTRAT-3250 branch 2 times, most recently from e8dd6e1 to 68fbb5f Compare May 19, 2026 18:05

bryan-cox force-pushed the OCPSTRAT-3250 branch 2 times, most recently from 5f6b081 to a135a21 Compare May 20, 2026 14:39

enxebre reviewed May 25, 2026

View reviewed changes

Comment thread enhancements/hypershift/konflux-release-gating-pipeline.md

enxebre reviewed May 25, 2026

View reviewed changes

bryan-cox force-pushed the OCPSTRAT-3250 branch from a135a21 to 7ea39a2 Compare May 26, 2026 15:17

Nirshal reviewed May 27, 2026

View reviewed changes

enxebre reviewed May 28, 2026

View reviewed changes

Comment thread enhancements/hypershift/konflux-release-gating-pipeline.md

enxebre reviewed May 28, 2026

View reviewed changes

bryan-cox and others added 2 commits May 28, 2026 09:10

Add CR interaction diagram to Implementation Details

9ede1c7

Shows how CronJob, Snapshot, IntegrationTestScenario, PipelineRun, Release, ReleasePlan, and ReleasePlanAdmission interact during the nightly gating flow. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 28, 2026

Nirshal mentioned this pull request May 28, 2026

feat(ci): add ho-release-gate pipeline for nightly promotion openshift/hypershift#8602

Draft

JoelSpeed reviewed May 29, 2026

View reviewed changes


		## Proposal

		A parallel, gated promotion path is added alongside the existing Konflux auto-release. A nightly pipeline resolves the latest HO Snapshot, runs e2e tests against the corresponding image, and promotes it to a verified repository only if tests pass. Each managed service platform receives its own independent promotion path with its own test suite and verified repository.


		## Summary

		This enhancement introduces a nightly, platform-independent gating system that validates HyperShift Operator (HO) images against end-to-end test suites before promoting them to verified repositories. The pipeline operates alongside the existing Konflux auto-release mechanism, adding a parallel promotion path that only advances images which have passed real-world e2e validation.


		## Proposal

		Add a parallel, gated promotion path alongside the existing auto-release. A nightly pipeline tests the latest HO image against platform-specific e2e suites and only promotes tested images to a verified repository. Each platform's promotion is independent — a failure on one does not block others.


		#### ReleasePlan (per-platform)

		A per-platform resource. The YAML below shows the ARO HCP pilot instance. Future platforms (ROSA HCP, GCP HCP) will each get their own ReleasePlan. All platforms push to the same verified repository, tagged differently per managed service. Auto-release is disabled (`auto-release: 'false'`), meaning images only reach the verified repo through explicit Release objects created after tests pass.


		#### IntegrationTestScenario (per-platform)

		A per-platform resource. This wires the e2e test Tekton pipeline as a gate on Snapshots. It references a pipeline definition stored in the HyperShift repository, allowing the test pipeline to evolve alongside the code it validates.


		Konflux build pipeline is the existing push build pipeline that creates Snapshots for every merged commit.

		e2e test pipeline is a Tekton Pipeline defined at `.tekton/pipelines/ho-release-gate.yaml` in the HyperShift repository.


		#### ReleasePlan (per-platform)

		A per-platform resource created by the HCP team in the `crt-redhat-acm-tenant` namespace. The YAML below shows the ARO HCP pilot instance. Future platforms (ROSA HCP, GCP HCP) will each get their own ReleasePlan. All platforms push to the same verified repository, tagged differently per managed service. Auto-release is disabled (`auto-release: 'false'`), meaning images only reach the verified repo through explicit Release objects created after tests pass.

		The nightly cadence means there is up to a 24-hour delay between a merge and its appearance in a verified repository. This is acceptable for production consumption but may require teams to continue using ACMD for rapid iteration.

		## Alternatives (Not Implemented)


		4. Platform e2e test integration: Bryan is working with the ARO HCP team to integrate their platform-specific e2e tests into the HyperShift repo, following the same pattern used for HyperShift's existing presubmit e2e tests.

		5. Regression analysis: deads2k raised that this release, decoupled from OCP releases, needs its own regression analysis in component readiness — comparing current HO against a sliding baseline to track the trajectory of the project. This needs further discussion to determine what that mechanism looks like and how it integrates with existing component readiness tooling.


		#### Design Rationale

		Nightly cadence (24h): Each pipeline run provisions real cloud infrastructure with platform-specific credentials (e.g., Azure for ARO HCP). A nightly cadence balances validation confidence with cloud infrastructure cost. Per-commit gating is cost-prohibitive and would slow the development feedback loop (see Alternatives). Per-platform cadence can differ — each platform can have its own CronJob schedule.

Conversation

bryan-cox commented May 19, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

openshift-ci-robot commented May 19, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

bryan-cox May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bryan-cox May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bryan-cox commented May 19, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

bryan-cox commented May 19, 2026 •

edited by openshift-ci Bot

Loading

openshift-ci-robot commented May 19, 2026 •

edited by openshift-ci Bot

Loading

bryan-cox May 19, 2026 •

edited

Loading

bryan-cox May 19, 2026 •

edited

Loading