Skip to content

OCPBUGS-81505: Fix update-hosts.sh crash when no hosts registered yet#684

Closed
rwsu wants to merge 1 commit intoopenshift:masterfrom
rwsu:OCPBUGS-81505
Closed

OCPBUGS-81505: Fix update-hosts.sh crash when no hosts registered yet#684
rwsu wants to merge 1 commit intoopenshift:masterfrom
rwsu:OCPBUGS-81505

Conversation

@rwsu
Copy link
Copy Markdown
Contributor

@rwsu rwsu commented Apr 1, 2026

When update-hosts.service starts, it may call the infra-envs hosts API before any hosts have registered with assisted-service. In this race condition, assisted-service returns an error JSON object instead of an empty array, causing 'jq -r .[].id' to fail with "Cannot index string with string 'id'". With set -e, this kills the script before it can patch the install ignition on any host.

Fix by skipping the hosts API call until the cluster reaches 'ready' status, which guarantees all hosts have registered and been validated. Patching also continues through 'preparing-for-installation' to ensure all hosts are updated before disk installation begins.

Assisted-by: Claude Sonnet 4.6 (1M context) noreply@anthropic.com

When update-hosts.service starts, it may call the infra-envs hosts API
before any hosts have registered with assisted-service. In this race
condition, assisted-service returns an error JSON object instead of an
empty array, causing 'jq -r .[].id' to fail with "Cannot index string
with string 'id'". With set -e, this kills the script before it can
patch the install ignition on any host.

Fix by skipping the hosts API call until the cluster reaches 'ready'
status, which guarantees all hosts have registered and been validated.
Patching also continues through 'preparing-for-installation' to ensure
all hosts are updated before disk installation begins.

Assisted-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@openshift-ci-robot openshift-ci-robot added jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels Apr 1, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@rwsu: This pull request references Jira Issue OCPBUGS-81505, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.22.0) matches configured target version for branch (4.22.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

When update-hosts.service starts, it may call the infra-envs hosts API before any hosts have registered with assisted-service. In this race condition, assisted-service returns an error JSON object instead of an empty array, causing 'jq -r .[].id' to fail with "Cannot index string with string 'id'". With set -e, this kills the script before it can patch the install ignition on any host.

Fix by skipping the hosts API call until the cluster reaches 'ready' status, which guarantees all hosts have registered and been validated. Patching also continues through 'preparing-for-installation' to ensure all hosts are updated before disk installation begins.

Assisted-by: Claude Sonnet 4.6 (1M context) noreply@anthropic.com

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from gamli75 and oourfali April 1, 2026 02:17
@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Apr 1, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rwsu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 1, 2026
@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Apr 1, 2026

@rwsu: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-agent-compact-ipv4-iso-no-registry e9a925f link false /test e2e-agent-compact-ipv4-iso-no-registry

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

# Wait until cluster is ready (all hosts registered and validated) before patching.
# This avoids querying the hosts API before any hosts have registered, which causes
# assisted-service to return an error object instead of an empty array.
if [[ $cluster_status != "ready" && $cluster_status != "preparing-for-installation" ]]; then
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rwsu the service (to restart the registry) is required to be run at the very beginning, the update host instead was used to configure the host after the reboot (not strictly related, but the the update-host should be not used at all for the live iso as a longer term strategy, so in any case better to stay away from it)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After speaking with @andfasano, the right solution is to move the reconfigure scripts to the bootstrap ignition and we should test with the HA topology as the problem occurs on worker nodes which do not have the IRI registry running on them.

@rwsu
Copy link
Copy Markdown
Contributor Author

rwsu commented Apr 15, 2026

Closing. Replaced by #687

@rwsu rwsu closed this Apr 15, 2026
@openshift-ci-robot
Copy link
Copy Markdown

@rwsu: This pull request references Jira Issue OCPBUGS-81505. The bug has been updated to no longer refer to the pull request using the external bug tracker.

Details

In response to this:

When update-hosts.service starts, it may call the infra-envs hosts API before any hosts have registered with assisted-service. In this race condition, assisted-service returns an error JSON object instead of an empty array, causing 'jq -r .[].id' to fail with "Cannot index string with string 'id'". With set -e, this kills the script before it can patch the install ignition on any host.

Fix by skipping the hosts API call until the cluster reaches 'ready' status, which guarantees all hosts have registered and been validated. Patching also continues through 'preparing-for-installation' to ensure all hosts are updated before disk installation begins.

Assisted-by: Claude Sonnet 4.6 (1M context) noreply@anthropic.com

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/severity-critical Referenced Jira bug's severity is critical for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants