You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Worker nodes never join GCP CAPI-based OCP 4.22 clusters. Workers are provisioned (GCP VMs running) but stay stuck in an ignition fetch loop indefinitely, receiving HTTP 500 from the Machine Config Server (MCS).
Root Cause
In GCP CAPI installs, the installer places the bootstrap node in the same unmanaged instance group as master-0 (zone-a/b). Both are backends for the GCP Internal Load Balancer (ILB) on ports 6443 and 22623.
The GCP ILB uses connection-based session affinity (CONNECTION mode) — once a worker's TCP connection is established to a backend, it stays pinned for the connection lifetime.
During the critical window when workers first boot and fetch ignition config from api-int:22623, the bootstrap node is the healthiest/first-responding backend. All worker connections get pinned to bootstrap.
Bootstrap MCS is designed to refuse worker ignition requests — it only serves master configs:
refusing to serve bootstrap configuration to pool "worker"
Workers receive HTTP 500 forever. The in-cluster MCS running on master-0 receives zero worker requests despite being healthy and ready.
Evidence
Worker serial console: 900+ ignition GET attempts, all returning Internal Server Error
Bootstrap MCS logs: explicit refusing to serve bootstrap configuration to pool "worker"
In-cluster MCS logs: zero worker requests despite running 40+ minutes
Zero worker CSRs in the cluster
gcloud compute instance-groups unmanaged list-instances confirms bootstrap and master-0 in same group
Workaround
Remove bootstrap from the master instance group after masters are ready:
Bug
Worker nodes never join GCP CAPI-based OCP 4.22 clusters. Workers are provisioned (GCP VMs running) but stay stuck in an ignition fetch loop indefinitely, receiving HTTP 500 from the Machine Config Server (MCS).
Root Cause
In GCP CAPI installs, the installer places the bootstrap node in the same unmanaged instance group as master-0 (zone-a/b). Both are backends for the GCP Internal Load Balancer (ILB) on ports 6443 and 22623.
The GCP ILB uses connection-based session affinity (
CONNECTIONmode) — once a worker's TCP connection is established to a backend, it stays pinned for the connection lifetime.During the critical window when workers first boot and fetch ignition config from
api-int:22623, the bootstrap node is the healthiest/first-responding backend. All worker connections get pinned to bootstrap.Bootstrap MCS is designed to refuse worker ignition requests — it only serves master configs:
Workers receive HTTP 500 forever. The in-cluster MCS running on master-0 receives zero worker requests despite being healthy and ready.
Evidence
Internal Server Errorrefusing to serve bootstrap configuration to pool "worker"gcloud compute instance-groups unmanaged list-instancesconfirms bootstrap and master-0 in same groupWorkaround
Remove bootstrap from the master instance group after masters are ready:
All 3 workers completed ignition within minutes and joined the cluster after applying this.
Suggested Fix
The installer's GCP CAPI code should either:
Environment
Impact
This affects all GCP CAPI installs in OCP 4.22, not just WIF/STS. CAPI became the default for GCP IPI in 4.22.
Note
Responses generated with Claude