[WIP] CNTRLPLANE-3719: Calico/Cilium jobs for release-5.0#81133
[WIP] CNTRLPLANE-3719: Calico/Cilium jobs for release-5.0#81133mgencur wants to merge 4 commits into
Conversation
Port three calico conformance tests from 4.21 to the 4.22 periodics config: e2e-aws-conformance-calico, e2e-aws-conformance-calico-private, and e2e-kubevirt-metal-conformance-calico. LVM operator bumped to stable-4.22; ODF kept at stable-4.21. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…o hypershift step registry
Move cucushift-installer-rehearse-aws-ipi-ovn-hypershift-private-guest-{calico,cilium}
workflows from cucushift/ to hypershift/aws/conformance-{calico,cilium}-private/.
Consolidate TEST_SKIPS into the workflow definitions and remove them from
individual CI configs (4.19-4.22 periodics). Delete the now-unused cucushift
workflow files.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…OCP 5.0 Copy calico and cilium AWS conformance tests from 4.22 periodics to 5.0: - e2e-aws-conformance-cilium-private - e2e-aws-conformance-calico - e2e-aws-conformance-calico-private Kubevirt-metal variants will be added separately. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@mgencur: This pull request references CNTRLPLANE-3719 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
WalkthroughThe PR adds new private HyperShift conformance workflows for cilium and calico, retargets periodic jobs to those workflows, updates shared CNI and rollout-checking scripts, and adjusts kubevirt baremetalds conformance settings and skip lists. ChangesHyperShift conformance workflow migration
Sequence Diagram(s)sequenceDiagram
participant HypershiftAwsConformanceCiliumPrivate as "hypershift-aws-conformance-cilium-private"
participant HypershiftAwsCreateChain as "hypershift-aws-create-chain"
participant HypershiftConformance as "hypershift-conformance"
participant ReportPortal
HypershiftAwsConformanceCiliumPrivate->>HypershiftAwsCreateChain: pre provisioning and enablement steps
HypershiftAwsConformanceCiliumPrivate->>HypershiftConformance: run conformance tests
HypershiftAwsConformanceCiliumPrivate->>ReportPortal: send post-run results
Estimated review effort🎯 4 (Complex) | ⏱️ ~60 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 14 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (14 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: mgencur The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/pj-rehearse periodic-ci-openshift-hypershift-release-5.0-periodics-e2e-aws-conformance-calico periodic-ci-openshift-hypershift-release-5.0-periodics-e2e-aws-conformance-calico-private periodic-ci-openshift-hypershift-release-5.0-periodics-e2e-aws-conformance-cilium-private |
|
@mgencur: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
[REHEARSALNOTIFIER]
A total of 702 jobs have been affected by this change. The above listing is non-exhaustive and limited to 25 jobs. A full list of affected jobs can be found here Interacting with pj-rehearseComment: Once you are satisfied with the results of the rehearsals, comment: |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@ci-operator/step-registry/cucushift/hypershift-extended/calico/health-check/cucushift-hypershift-extended-calico-health-check-commands.sh`:
- Line 22: The cni-config annotation command is not idempotent, so retries can
fail if the annotation already exists. Update the oc annotate invocation in the
calico health-check command script to use --overwrite so repeated runs succeed
cleanly under set -e. Use the existing oc annotate configmap cni-config -n
calico-system unsupported.operator.tigera.io/ignore=true command as the target
to adjust.
- Around line 24-26: The `cniVersion` patch in the calico health-check script is
matching an escaped JSON string, but `oc get cm cni-config -o yaml` exposes the
embedded CNI config as raw text, so the substitution never applies. Update the
patching step in the health-check command script to replace the raw `cniVersion`
value directly, and add a check that confirms `0.4.0` is present in the
ConfigMap output before the `calico-node` restart step runs.
In
`@ci-operator/step-registry/cucushift/hypershift-extended/enable-qe/pull-secret/cucushift-hypershift-extended-enable-qe-pull-secret-commands.sh`:
- Around line 144-148: The MachineDeployment rollout check waits for the
transient RollingOut=True state before waiting for RollingOut=False, which can
fail on fast or skipped transitions under set -e. Update the logic in the loop
that uses oc wait on machinedeployment so it only waits for the terminal
RollingOut=False condition, and remove the initial RollingOut=True wait while
keeping the existing md/MD_NAMESPACE/CLUSTER_NAME lookup flow intact.
In
`@ci-operator/step-registry/hypershift/kubevirt/baremetalds/conformance-cilium/hypershift-kubevirt-baremetalds-conformance-cilium-workflow.yaml`:
- Around line 58-62: The TEST_SKIPS alternation for the KubeVirt
conformance-cilium workflow is too broad because the migration skip lost the
“and live migrate hosted control plane” suffix. Update the skip list in the
workflow definition so the original full KubeVirt migration match is preserved,
and add the new Service-related skips as separate alternation entries without
changing the existing migration scope.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 96785943-fa5c-4036-aab7-dc3bf08db878
⛔ Files ignored due to path filters (2)
ci-operator/jobs/openshift/hypershift/openshift-hypershift-release-4.22-periodics.yamlis excluded by!ci-operator/jobs/**ci-operator/jobs/openshift/hypershift/openshift-hypershift-release-5.0-periodics.yamlis excluded by!ci-operator/jobs/**
📒 Files selected for processing (20)
ci-operator/config/openshift/hypershift/openshift-hypershift-release-4.19__periodics.yamlci-operator/config/openshift/hypershift/openshift-hypershift-release-4.20__periodics.yamlci-operator/config/openshift/hypershift/openshift-hypershift-release-4.21__periodics.yamlci-operator/config/openshift/hypershift/openshift-hypershift-release-4.22__periodics.yamlci-operator/config/openshift/hypershift/openshift-hypershift-release-5.0__periodics.yamlci-operator/step-registry/cucushift/hypershift-extended/calico/health-check/cucushift-hypershift-extended-calico-health-check-commands.shci-operator/step-registry/cucushift/hypershift-extended/enable-qe/pull-secret/cucushift-hypershift-extended-enable-qe-pull-secret-commands.shci-operator/step-registry/cucushift/installer/rehearse/aws/ipi/ovn/hypershift/private/guest/calico/cucushift-installer-rehearse-aws-ipi-ovn-hypershift-private-guest-calico-workflow.yamlci-operator/step-registry/cucushift/installer/rehearse/aws/ipi/ovn/hypershift/private/guest/cilium/cucushift-installer-rehearse-aws-ipi-ovn-hypershift-private-guest-cilium-workflow.yamlci-operator/step-registry/hypershift/aws/conformance-calico-private/OWNERSci-operator/step-registry/hypershift/aws/conformance-calico-private/hypershift-aws-conformance-calico-private-workflow.metadata.jsonci-operator/step-registry/hypershift/aws/conformance-calico-private/hypershift-aws-conformance-calico-private-workflow.yamlci-operator/step-registry/hypershift/aws/conformance-calico/hypershift-aws-conformance-calico-workflow.yamlci-operator/step-registry/hypershift/aws/conformance-cilium-private/OWNERSci-operator/step-registry/hypershift/aws/conformance-cilium-private/hypershift-aws-conformance-cilium-private-workflow.metadata.jsonci-operator/step-registry/hypershift/aws/conformance-cilium-private/hypershift-aws-conformance-cilium-private-workflow.yamlci-operator/step-registry/hypershift/aws/conformance-cilium/hypershift-aws-conformance-cilium-workflow.yamlci-operator/step-registry/hypershift/aws/create/hypershift-aws-create-chain.yamlci-operator/step-registry/hypershift/kubevirt/baremetalds/conformance-calico/hypershift-kubevirt-baremetalds-conformance-calico-workflow.yamlci-operator/step-registry/hypershift/kubevirt/baremetalds/conformance-cilium/hypershift-kubevirt-baremetalds-conformance-cilium-workflow.yaml
💤 Files with no reviewable changes (2)
- ci-operator/step-registry/cucushift/installer/rehearse/aws/ipi/ovn/hypershift/private/guest/cilium/cucushift-installer-rehearse-aws-ipi-ovn-hypershift-private-guest-cilium-workflow.yaml
- ci-operator/step-registry/cucushift/installer/rehearse/aws/ipi/ovn/hypershift/private/guest/calico/cucushift-installer-rehearse-aws-ipi-ovn-hypershift-private-guest-calico-workflow.yaml
| oc -n calico-system rollout status ds/calico-node --timeout=15m || true | ||
|
|
||
| # Annotate the configmap to prevent the operator from reverting the patch. | ||
| oc annotate configmap cni-config -n calico-system unsupported.operator.tigera.io/ignore=true |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win
Add --overwrite to keep the annotate idempotent on retries.
Without --overwrite, a re-run (or any prior presence of the annotation) makes oc annotate error, and with set -e the whole step aborts.
🛡️ Proposed fix
-oc annotate configmap cni-config -n calico-system unsupported.operator.tigera.io/ignore=true
+oc annotate configmap cni-config -n calico-system unsupported.operator.tigera.io/ignore=true --overwrite📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| oc annotate configmap cni-config -n calico-system unsupported.operator.tigera.io/ignore=true | |
| oc annotate configmap cni-config -n calico-system unsupported.operator.tigera.io/ignore=true --overwrite |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@ci-operator/step-registry/cucushift/hypershift-extended/calico/health-check/cucushift-hypershift-extended-calico-health-check-commands.sh`
at line 22, The cni-config annotation command is not idempotent, so retries can
fail if the annotation already exists. Update the oc annotate invocation in the
calico health-check command script to use --overwrite so repeated runs succeed
cleanly under set -e. Use the existing oc annotate configmap cni-config -n
calico-system unsupported.operator.tigera.io/ignore=true command as the target
to adjust.
| oc -n calico-system get cm cni-config -o yaml | \ | ||
| sed 's/\\"cniVersion\\": \\"0.3.1\\"/\\"cniVersion\\": \\"0.4.0\\"/' | \ | ||
| oc apply -f - |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Inspect how cniVersion is rendered in the cni-config ConfigMap definition/usage in-repo.
rg -nP --type=sh -C3 'cniVersion|cni-config' ci-operator/step-registry/cucushift/hypershift-extended/calico/Repository: openshift/release
Length of output: 3130
🏁 Script executed:
#!/bin/bash
set -euo pipefail
file=ci-operator/step-registry/cucushift/hypershift-extended/calico/health-check/cucushift-hypershift-extended-calico-health-check-commands.sh
cat -n "$file" | sed -n '1,120p'
printf '\n--- related matches ---\n'
rg -n --context 2 'unsupported.operator.tigera.io/ignore|cniVersion|cni-config|calico-node' ci-operator/step-registry/cucushift/hypershift-extended/calico/Repository: openshift/release
Length of output: 5277
🏁 Script executed:
#!/bin/bash
set -euo pipefail
file=ci-operator/step-registry/cucushift/hypershift-extended/calico/health-check/cucushift-hypershift-extended-calico-health-check-commands.sh
cat -n "$file" | sed -n '1,120p'
printf '\n--- related matches ---\n'
rg -n --context 2 'unsupported.operator.tigera.io/ignore|cniVersion|cni-config|calico-node' ci-operator/step-registry/cucushift/hypershift-extended/calico/Repository: openshift/release
Length of output: 5277
Fix the cniVersion patching step
oc get cm -o yaml emits the embedded CNI config as raw JSON text, so the sed 's/\\"cniVersion\\": \\"0.3.1\\"/\\"cniVersion\\": \\"0.4.0\\"/' expression won’t match. Patch the raw cniVersion value directly and assert 0.4.0 is present before restarting calico-node.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@ci-operator/step-registry/cucushift/hypershift-extended/calico/health-check/cucushift-hypershift-extended-calico-health-check-commands.sh`
around lines 24 - 26, The `cniVersion` patch in the calico health-check script
is matching an escaped JSON string, but `oc get cm cni-config -o yaml` exposes
the embedded CNI config as raw text, so the substitution never applies. Update
the patching step in the health-check command script to replace the raw
`cniVersion` value directly, and add a check that confirms `0.4.0` is present in
the ConfigMap output before the `calico-node` restart step runs.
| for md in $(oc get machinedeployments -n "${MD_NAMESPACE}" -l "cluster.x-k8s.io/cluster-name=${CLUSTER_NAME}" -o jsonpath='{.items[*].metadata.name}'); do | ||
| oc wait machinedeployment "${md}" -n "${MD_NAMESPACE}" --for=condition=RollingOut=True --timeout=5m | ||
| echo "Waiting for MachineDeployment ${md} to finish rolling out..." | ||
| oc wait machinedeployment "${md}" -n "${MD_NAMESPACE}" --for=condition=RollingOut=False --timeout=45m | ||
| done |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major
🧩 Analysis chain
🌐 Web query:
What is the RollingOut condition on Cluster API MachineDeployment and in which apiVersion (v1beta1 vs v1beta2) is it available?
💡 Result:
In Cluster API, the RollingOut condition on a MachineDeployment is a status condition that indicates whether the deployment is currently in the process of rolling out new machines [1][2]. This condition is part of the v1beta2 API version [1][3]. It is not available in the v1beta1 API version [1][4]. Key details: - Availability: The RollingOut condition was introduced as part of the v1beta2 API improvements [1][5]. In later versions of Cluster API, these v1beta2 conditions were promoted to the top level of the resource status [4]. - Meaning: The condition is set to True when there is at least one machine managed by the MachineDeployment that is not yet up-to-date with the desired specification [1]. It is set to False when all machines are up-to-date [1]. - Reasons: Common reasons associated with this condition include MachineDeploymentRollingOutV1Beta2Reason (when rolling out) and MachineDeploymentNotRollingOutV1Beta2Reason (when fully updated) [1]. While v1beta1 remains a supported API version for many objects in Cluster API, the advanced status conditions like RollingOut are a defining feature of the v1beta2 contract [1][4].
Citations:
- 1: kubernetes-sigs/cluster-api@6865cec
- 2: https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20240916-improve-status-in-CAPI-resources.md
- 3: https://kubespec.dev/cluster-api/cluster.x-k8s.io/v1beta2/MachineDeployment
- 4: ⚠️ Promote v1beta2 conditions kubernetes-sigs/cluster-api#12066
- 5: 📖 Small improvements to the v1beta2 status proposal kubernetes-sigs/cluster-api#11460
🏁 Script executed:
#!/bin/bash
set -euo pipefail
file='ci-operator/step-registry/cucushift/hypershift-extended/enable-qe/pull-secret/cucushift-hypershift-extended-enable-qe-pull-secret-commands.sh'
# Map the file and inspect the relevant section with line numbers.
wc -l "$file"
sed -n '120,170p' "$file"Repository: openshift/release
Length of output: 3612
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect the script for surrounding logic and any shell options that affect failure behavior.
grep -nE 'set -e|set -u|set -o pipefail|RollingOut|machinedeployment|oc wait|pullSecret|patch' \
'ci-operator/step-registry/cucushift/hypershift-extended/enable-qe/pull-secret/cucushift-hypershift-extended-enable-qe-pull-secret-commands.sh'Repository: openshift/release
Length of output: 1528
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Check how Cluster API MachineDeployment conditions are used elsewhere in this repo.
rg -n --no-heading 'RollingOut|MachineDeploymentNotRollingOut|MachineDeploymentRollingOut' ci-operator core-services cluster services projectsRepository: openshift/release
Length of output: 701
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Read the file from the current repository version to understand the exact wait logic.
nl -ba 'ci-operator/step-registry/cucushift/hypershift-extended/enable-qe/pull-secret/cucushift-hypershift-extended-enable-qe-pull-secret-commands.sh' | sed -n '136,156p'Repository: openshift/release
Length of output: 195
🏁 Script executed:
#!/bin/bash
set -euo pipefail
file='ci-operator/step-registry/cucushift/hypershift-extended/enable-qe/pull-secret/cucushift-hypershift-extended-enable-qe-pull-secret-commands.sh'
nl -ba "$file" | sed -n '130,160p'Repository: openshift/release
Length of output: 195
Remove the RollingOut=True wait. Waiting for RollingOut=True before RollingOut=False can miss a fast rollout or one that never exposes the True edge, causing a healthy cluster to fail under set -e. Wait only for the terminal RollingOut=False condition. RollingOut is the v1beta2 MachineDeployment condition.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@ci-operator/step-registry/cucushift/hypershift-extended/enable-qe/pull-secret/cucushift-hypershift-extended-enable-qe-pull-secret-commands.sh`
around lines 144 - 148, The MachineDeployment rollout check waits for the
transient RollingOut=True state before waiting for RollingOut=False, which can
fail on fast or skipped transitions under set -e. Update the logic in the loop
that uses oc wait on machinedeployment so it only waits for the terminal
RollingOut=False condition, and remove the initial RollingOut=True wait while
keeping the existing md/MD_NAMESPACE/CLUSTER_NAME lookup flow intact.
| migration when running openshift cluster on KubeVirt virtual machines\| | ||
| Services should fallback to local terminating endpoints | ||
| when there are no ready endpoints with externalTrafficPolicy=Local\| Services | ||
| should be rejected when no endpoints exist\| Services should be rejected for | ||
| evicted pods |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Inspecting the edited TEST_SKIPS entry and sibling workflow patterns..."
rg -n -C2 'migration when running openshift cluster on KubeVirt virtual machines' \
ci-operator/step-registry/hypershift/kubevirt/baremetalds
echo
echo "Looking for the full live-migrate literal elsewhere in the repo..."
rg -n -C2 'live migrate hosted control plane' ci-operator/step-registryRepository: openshift/release
Length of output: 5859
🏁 Script executed:
#!/bin/bash
set -euo pipefail
sed -n '52,66p' ci-operator/step-registry/hypershift/kubevirt/baremetalds/conformance-cilium/hypershift-kubevirt-baremetalds-conformance-cilium-workflow.yaml
echo
python3 - <<'PY'
from pathlib import Path
p = Path("ci-operator/step-registry/hypershift/kubevirt/baremetalds/conformance-cilium/hypershift-kubevirt-baremetalds-conformance-cilium-workflow.yaml")
for i, line in enumerate(p.read_text().splitlines(), 1):
if 55 <= i <= 62:
print(f"{i}: {line!r}")
PYRepository: openshift/release
Length of output: 1656
Restore the full KubeVirt migration skip The TEST_SKIPS alternation was shortened to migration when running openshift cluster on KubeVirt virtual machines, which widens the match beyond the original live-migrate case. Keep the and live migrate hosted control plane suffix here and add the new Service skips separately.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@ci-operator/step-registry/hypershift/kubevirt/baremetalds/conformance-cilium/hypershift-kubevirt-baremetalds-conformance-cilium-workflow.yaml`
around lines 58 - 62, The TEST_SKIPS alternation for the KubeVirt
conformance-cilium workflow is too broad because the migration skip lost the
“and live migrate hosted control plane” suffix. Update the skip list in the
workflow definition so the original full KubeVirt migration match is preserved,
and add the new Service-related skips as separate alternation entries without
changing the existing migration scope.
|
@mgencur: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
https://redhat.atlassian.net/browse/CNTRLPLANE-3719
This PR currently adds only the following configs:
The other configs should be migrated after https://redhat.atlassian.net/browse/CNTRLPLANE-3732 because they need to be placed in a different CI config that doesn't exist yet:
Summary by CodeRabbit
This update moves several HyperShift conformance periodic jobs onto the newer workflow/step-registry layout and expands CI coverage for AWS and KubeVirt bare-metal conformance.
What changed
hypershift-aws-conformance-*workflows.Practical impact
These changes formalize the HyperShift CI path for Calico and Cilium conformance on AWS, including private-cluster variants, while preparing the remaining KubeVirt-metal and agent-connected jobs for a later migration once the required CI config exists.