Skip to content

CNTRLPLANE-2262: Enable Azure scale-from-zero in CI#79770

Draft
jhjaggars wants to merge 1 commit into
openshift:mainfrom
jhjaggars:azure-scale-from-zero
Draft

CNTRLPLANE-2262: Enable Azure scale-from-zero in CI#79770
jhjaggars wants to merge 1 commit into
openshift:mainfrom
jhjaggars:azure-scale-from-zero

Conversation

@jhjaggars
Copy link
Copy Markdown
Contributor

@jhjaggars jhjaggars commented May 27, 2026

Summary

  • Add --scale-from-zero-provider azure to the hypershift install step for Azure self-managed clusters
  • Add TestNodePoolAutoscalingScaleFromZero to the e2e-azure-self-managed CI test regex
  • Temporarily override hypershift-operator and hypershift-tests images with pre-built images from quay.io/jjaggars/ to validate the full scale-from-zero flow via pj-rehearse

Details

The install script constructs a scale-from-zero credentials file by merging the existing Azure SP credentials with subscriptionId (from az account show) and location (from HYPERSHIFT_AZURE_LOCATION env var).

The image overrides are temporary for rehearsal testing only. Before merge, the images section will be reverted to use Dockerfile and Dockerfile.e2e from the hypershift repo.

Dependencies

Companion PR: openshift/hypershift#8337

Test plan

  • Rehearse with /pj-rehearse e2e-azure-self-managed
  • Verify TestNodePoolAutoscalingScaleFromZero passes on Azure
  • Revert image overrides before merge

🤖 Generated with Claude Code

Summary by CodeRabbit

This PR enables Azure scale-from-zero autoscaling support in the CI infrastructure for the hypershift project. It makes two key sets of changes:

CI Configuration Changes:
The PR modifies the hypershift CI pipeline for Azure self-managed clusters to:

  • Pull pre-built temporary container images (quay.io/jjaggars/hypershift-operator:azure-sfz and quay.io/jjaggars/hypershift-tests:azure-sfz) instead of building from source, allowing validation of scale-from-zero functionality through pj-rehearse before the upstream hypershift code is available
  • Add the TestNodePoolAutoscalingScaleFromZero test case to the e2e-azure-self-managed test regex, ensuring this test runs during CI

Installation Script Enhancement:
The hypershift installation script now conditionally enables scale-from-zero for Azure self-managed clusters by:

  • Detecting when self-managed Azure credentials are available
  • Querying the Azure subscription ID and constructing a merged credentials file with the subscription ID and location (defaulting to centralus)
  • Passing --scale-from-zero-provider azure and corresponding credentials to the hypershift install command

These changes allow the CI infrastructure to validate the scale-from-zero autoscaling feature for Azure clusters. The temporary image overrides are intended to be reverted before merge, once the companion PR (openshift/hypershift#8337) is integrated.

Add scale-from-zero configuration to the Azure self-managed install
path and include TestNodePoolAutoscalingScaleFromZero in the
e2e-azure-self-managed test regex.

Temporarily override hypershift-operator and hypershift-tests images
with pre-built images from quay.io/jjaggars/ to test the full
scale-from-zero flow via pj-rehearse before merging the companion
hypershift PR #8337.
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label May 27, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

openshift-ci-robot commented May 27, 2026

@jhjaggars: This pull request references CNTRLPLANE-2262 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Summary

  • Add --scale-from-zero-provider azure to the hypershift install step for Azure self-managed clusters
  • Add TestNodePoolAutoscalingScaleFromZero to the e2e-azure-self-managed CI test regex
  • Temporarily override hypershift-operator and hypershift-tests images with pre-built images from quay.io/jjaggars/ to validate the full scale-from-zero flow via pj-rehearse

Details

The install script constructs a scale-from-zero credentials file by merging the existing Azure SP credentials with subscriptionId (from az account show) and location (from HYPERSHIFT_AZURE_LOCATION env var).

The image overrides are temporary for rehearsal testing only. Before merge, the images section will be reverted to use Dockerfile and Dockerfile.e2e from the hypershift repo.

Dependencies

Companion PR: openshift/hypershift#8337

Test plan

  • Rehearse with /pj-rehearse e2e-azure-self-managed
  • Verify TestNodePoolAutoscalingScaleFromZero passes on Azure
  • Revert image overrides before merge

🤖 Generated with Claude Code

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 27, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 27, 2026

Walkthrough

This PR enables Azure scale-from-zero autoscaling for self-managed Hypershift deployments. CI image builds are reconfigured to use prebuilt images from quay.io/jjaggars/ with azure-sfz tags, the e2e test selection is extended to include scale-from-zero tests, and the installation script adds conditional logic to configure Azure scale-from-zero credentials.

Changes

Azure Scale-from-Zero Support

Layer / File(s) Summary
Image builds and e2e test selection
ci-operator/config/openshift/hypershift/openshift-hypershift-main.yaml
Image build definitions switch to dockerfile_literal entries pulling hypershift-operator and hypershift-tests from quay.io/jjaggars/ with azure-sfz tags. Test selection regex is extended to include TestNodePoolAutoscalingScaleFromZero for Azure self-managed e2e runs.
Installation script scale-from-zero enablement
ci-operator/step-registry/hypershift/install/hypershift-install-commands.sh
Conditional Azure block retrieves subscription ID via az account show, augments credentials JSON with subscriptionId and location (defaulting to centralus), and appends --scale-from-zero-provider azure and --scale-from-zero-creds flags to EXTRA_ARGS for self-managed deployments.

🎯 2 (Simple) | ⏱️ ~12 minutes


rehearsals-ack


Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error)

Check name Status Explanation Resolution
No-Sensitive-Data-In-Logs ❌ Error Script uses set -x which logs commands executed. Azure credentials file path is added to arguments passed to HCP_CLI install, exposing this path in debug logs. Use set +x before invoking HCP_CLI install command to suppress debug output when passing credentials file paths as arguments.
✅ Passed checks (14 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes the main change: enabling Azure scale-from-zero functionality in the CI configuration.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed PR contains no Ginkgo test definitions. Only test name reference is TestNodePoolAutoscalingScaleFromZero in CI config, which is static, descriptive, and contains no dynamic elements.
Test Structure And Quality ✅ Passed PR modifies only CI configuration (YAML) and installation shell script files; no Ginkgo test code is present or modified, so test structure quality check is not applicable.
Microshift Test Compatibility ✅ Passed The PR modifies only CI configuration (YAML) and install scripts (bash), not adding new Ginkgo test code. The check for MicroShift test compatibility is not applicable here.
Single Node Openshift (Sno) Test Compatibility ✅ Passed No new Ginkgo tests added; PR modifies CI configuration to run existing hypershift test via new Azure scale-from-zero workflow. Test code resides in separate hypershift repository.
Topology-Aware Scheduling Compatibility ✅ Passed PR modifies CI configuration and shell scripts, not deployment manifests or operator code. No Kubernetes resources with scheduling constraints are defined.
Ote Binary Stdout Contract ✅ Passed The PR modifies only YAML CI config and bash scripts, not Go test binaries or OTE code, so the OTE Stdout Contract check is not applicable.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed This PR does not add new Ginkgo e2e tests. It only modifies CI configuration YAML and installation scripts, which are not subject to this IPv6/disconnected network check.
No-Weak-Crypto ✅ Passed The PR changes contain no weak cryptographic algorithms (MD5, SHA1, DES, RC4, 3DES, Blowfish, ECB), custom crypto implementations, or non-constant-time secret comparisons.
Container-Privileges ✅ Passed No privileged container configurations found. PR changes are CI configuration and shell scripts only, not K8s/container manifests with security contexts.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 27, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

[REHEARSALNOTIFIER]
@jhjaggars: the pj-rehearse plugin accommodates running rehearsal tests for the changes in this PR. Expand 'Interacting with pj-rehearse' for usage details. The following rehearsable tests have been affected by this change:

Test name Repo Type Reason
pull-ci-openshift-cluster-node-tuning-operator-main-e2e-hypershift-pao openshift/cluster-node-tuning-operator presubmit Registry content changed
pull-ci-openshift-cluster-node-tuning-operator-main-e2e-pao-updating-profile-hypershift openshift/cluster-node-tuning-operator presubmit Registry content changed
pull-ci-openshift-cluster-node-tuning-operator-release-5.1-e2e-hypershift-pao openshift/cluster-node-tuning-operator presubmit Registry content changed
pull-ci-openshift-cluster-node-tuning-operator-release-5.1-e2e-pao-updating-profile-hypershift openshift/cluster-node-tuning-operator presubmit Registry content changed
pull-ci-openshift-cluster-node-tuning-operator-release-5.0-e2e-hypershift-pao openshift/cluster-node-tuning-operator presubmit Registry content changed
pull-ci-openshift-cluster-node-tuning-operator-release-5.0-e2e-pao-updating-profile-hypershift openshift/cluster-node-tuning-operator presubmit Registry content changed
pull-ci-openshift-cluster-node-tuning-operator-release-4.23-e2e-hypershift-pao openshift/cluster-node-tuning-operator presubmit Registry content changed
pull-ci-openshift-cluster-node-tuning-operator-release-4.23-e2e-pao-updating-profile-hypershift openshift/cluster-node-tuning-operator presubmit Registry content changed
pull-ci-openshift-cluster-node-tuning-operator-release-4.22-e2e-hypershift-pao openshift/cluster-node-tuning-operator presubmit Registry content changed
pull-ci-openshift-cluster-node-tuning-operator-release-4.22-e2e-pao-updating-profile-hypershift openshift/cluster-node-tuning-operator presubmit Registry content changed
pull-ci-openshift-cluster-node-tuning-operator-release-4.21-e2e-hypershift-pao openshift/cluster-node-tuning-operator presubmit Registry content changed
pull-ci-openshift-cluster-node-tuning-operator-release-4.21-e2e-pao-updating-profile-hypershift openshift/cluster-node-tuning-operator presubmit Registry content changed
pull-ci-openshift-cluster-node-tuning-operator-release-4.20-e2e-hypershift-pao openshift/cluster-node-tuning-operator presubmit Registry content changed
pull-ci-openshift-cluster-node-tuning-operator-release-4.20-e2e-pao-updating-profile-hypershift openshift/cluster-node-tuning-operator presubmit Registry content changed
pull-ci-openshift-cluster-node-tuning-operator-release-4.19-e2e-hypershift-pao openshift/cluster-node-tuning-operator presubmit Registry content changed
pull-ci-openshift-cluster-node-tuning-operator-release-4.19-e2e-pao-updating-profile-hypershift openshift/cluster-node-tuning-operator presubmit Registry content changed
pull-ci-openshift-cluster-node-tuning-operator-release-4.18-e2e-hypershift-pao openshift/cluster-node-tuning-operator presubmit Registry content changed
pull-ci-openshift-cluster-node-tuning-operator-release-4.18-e2e-pao-updating-profile-hypershift openshift/cluster-node-tuning-operator presubmit Registry content changed
pull-ci-openshift-cluster-node-tuning-operator-release-4.17-e2e-hypershift-pao openshift/cluster-node-tuning-operator presubmit Registry content changed
pull-ci-openshift-cluster-node-tuning-operator-release-4.16-e2e-hypershift-pao openshift/cluster-node-tuning-operator presubmit Registry content changed
pull-ci-openshift-cluster-api-provider-azure-main-hypershift-e2e-aks openshift/cluster-api-provider-azure presubmit Registry content changed
pull-ci-openshift-cluster-api-provider-azure-release-5.1-hypershift-e2e-aks openshift/cluster-api-provider-azure presubmit Registry content changed
pull-ci-openshift-cluster-api-provider-azure-release-5.0-hypershift-e2e-aks openshift/cluster-api-provider-azure presubmit Registry content changed
pull-ci-openshift-cluster-api-provider-azure-release-4.23-hypershift-e2e-aks openshift/cluster-api-provider-azure presubmit Registry content changed
pull-ci-openshift-cluster-api-provider-azure-release-4.22-hypershift-e2e-aks openshift/cluster-api-provider-azure presubmit Registry content changed

A total of 689 jobs have been affected by this change. The above listing is non-exhaustive and limited to 25 jobs.

A full list of affected jobs can be found here
Prior to this PR being merged, you will need to either run and acknowledge or opt to skip these rehearsals.

Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse list to get an up-to-date list of affected jobs
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

@jhjaggars
Copy link
Copy Markdown
Contributor Author

/pj-rehearse e2e-azure-self-managed

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@ci-operator/step-registry/hypershift/install/hypershift-install-commands.sh`:
- Around line 125-132: The current lookup for SUBSCRIPTION_ID swallows errors
via "|| true" so scale-from-zero can be silently disabled; change the logic in
the block that sets SUBSCRIPTION_ID/SCALE_FROM_ZERO_CREDS/EXTRA_ARGS to fail
fast: remove the suppression so the az account show call returns a non-zero
status on error, and add an explicit check that if
/etc/hypershift-ci-jobs-self-managed-azure/credentials.json exists (or when
scale-from-zero is expected) and SUBSCRIPTION_ID is empty, emit an error to
stderr and exit non-zero; otherwise continue to create SCALE_FROM_ZERO_CREDS
(using HYPERSHIFT_AZURE_LOCATION) and append the
--scale-from-zero-provider/--scale-from-zero-creds flags to EXTRA_ARGS as
before.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: c53907b8-3f56-4d65-a9ec-de99fc19a924

📥 Commits

Reviewing files that changed from the base of the PR and between f69ff5f and dedc98b.

⛔ Files ignored due to path filters (3)
  • ci-operator/jobs/openshift/hypershift/openshift-hypershift-main-periodics.yaml is excluded by !ci-operator/jobs/**
  • ci-operator/jobs/openshift/hypershift/openshift-hypershift-main-postsubmits.yaml is excluded by !ci-operator/jobs/**
  • ci-operator/jobs/openshift/hypershift/openshift-hypershift-main-presubmits.yaml is excluded by !ci-operator/jobs/**
📒 Files selected for processing (2)
  • ci-operator/config/openshift/hypershift/openshift-hypershift-main.yaml
  • ci-operator/step-registry/hypershift/install/hypershift-install-commands.sh

Comment on lines +125 to +132
SUBSCRIPTION_ID=$(az account show --query id -o tsv 2>/dev/null || true)
if [ -n "${SUBSCRIPTION_ID}" ]; then
SCALE_FROM_ZERO_CREDS=$(mktemp)
jq --arg sub "${SUBSCRIPTION_ID}" --arg loc "${HYPERSHIFT_AZURE_LOCATION:-centralus}" \
'. + {subscriptionId: $sub, location: $loc}' \
/etc/hypershift-ci-jobs-self-managed-azure/credentials.json > "${SCALE_FROM_ZERO_CREDS}"
EXTRA_ARGS="${EXTRA_ARGS} --scale-from-zero-provider azure --scale-from-zero-creds ${SCALE_FROM_ZERO_CREDS}"
fi
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

file="$(fd -a 'hypershift-install-commands.sh$' ci-operator/step-registry | head -n1)"
echo "Inspecting ${file}"
nl -ba "${file}" | sed -n '121,135p'
rg -n -C2 'az account show|\\|\\| true|scale-from-zero-provider azure' "${file}"

Repository: openshift/release

Length of output: 228


🏁 Script executed:

#!/bin/bash
set -euo pipefail

file="ci-operator/step-registry/hypershift/install/hypershift-install-commands.sh"
if [ ! -f "$file" ]; then
  echo "Missing $file" >&2
  exit 1
fi

echo "== Lines 121-135 =="
sed -n '121,135p' "$file" | cat -n

echo "== Relevant rg matches =="
rg -n -C2 'az account show|^\s*.*\|\|\s*true\s*$|scale-from-zero-provider azure' "$file"

Repository: openshift/release

Length of output: 1631


Fail fast (or explicitly error) when Azure subscription ID lookup fails for scale-from-zero

az account show ... 2>/dev/null || true suppresses subscription lookup errors, and the SFZ flags (--scale-from-zero-provider azure --scale-from-zero-creds ...) are only appended when SUBSCRIPTION_ID is non-empty—so a failure silently disables scale-from-zero even when credentials.json exists. Make this failure explicit (e.g., remove the || true/stderr suppression and/or emit an error + exit when SFZ is expected but the subscription id can’t be resolved).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ci-operator/step-registry/hypershift/install/hypershift-install-commands.sh`
around lines 125 - 132, The current lookup for SUBSCRIPTION_ID swallows errors
via "|| true" so scale-from-zero can be silently disabled; change the logic in
the block that sets SUBSCRIPTION_ID/SCALE_FROM_ZERO_CREDS/EXTRA_ARGS to fail
fast: remove the suppression so the az account show call returns a non-zero
status on error, and add an explicit check that if
/etc/hypershift-ci-jobs-self-managed-azure/credentials.json exists (or when
scale-from-zero is expected) and SUBSCRIPTION_ID is empty, emit an error to
stderr and exit non-zero; otherwise continue to create SCALE_FROM_ZERO_CREDS
(using HYPERSHIFT_AZURE_LOCATION) and append the
--scale-from-zero-provider/--scale-from-zero-creds flags to EXTRA_ARGS as
before.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 27, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jhjaggars
Once this PR has been reviewed and has the lgtm label, please assign sjenning for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@jhjaggars: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@openshift-ci[bot]: your /pj-rehearse request was not processed because the request waited in queue for longer than 5 minutes. Please retry in a few minutes.

@openshift-merge-bot
Copy link
Copy Markdown
Contributor

@jhjaggars: job(s): e2e-azure-self-managed either don't exist or were not found to be affected, and cannot be rehearsed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants