feat(istio): expose istio_ingressgateway_replicas to guarantee HA for node drains#379
Open
fedemaleh wants to merge 2 commits into
Open
feat(istio): expose istio_ingressgateway_replicas to guarantee HA for node drains#379fedemaleh wants to merge 2 commits into
fedemaleh wants to merge 2 commits into
Conversation
davidf-null
approved these changes
Jun 2, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Mirrors the istiod fix from #292 onto
istio-ingressgateway. Single-replica + chart-default PDB blocks every node rolling update withPodEvictionFailure.Problem
The Istio gateway Helm chart ships with:
autoscaling.enabled: true,autoscaling.minReplicas: 1minAvailable: 1With a single replica and
minAvailable: 1, the PDB resolves toALLOWED DISRUPTIONS: 0— every drain attempt against the node hosting the gateway pod fails. Identical class of bug as the single-replica istiod issue addressed in #292.Change
Adds
var.istio_ingressgateway_replicas(default2, matching the istiod precedent and HA-by-default posture) and wires it into both:replicaCount— the initial deployment replica countautoscaling.minReplicas— the HPA floorWithout overriding both, the HPA scales back to
1shortly after install and the deadlock returns. Same root cause as #292.Validation
Verified withtofu planagainst the existing galicia setup; produces an in-place update on thehelm_release.istio_ingressgatewaythat adds the twosetentriesPlan does not destroy the existing gateway Deployment; Helm rolls the change via standard upgradeDefault of2is consistent with feat(istio): expose istiod_replicas to guarantee HA for node drains #292; consumers wanting the previous single-replica behavior can opt out viaistio_ingressgateway_replicas = 1Test plan
istio_ingressgateway_replicas = 2kubectl get deploy -n istio-system istio-ingressgatewayshows 2/2kubectl get hpa -n istio-system istio-ingressgatewayshowsMINPODS: 2kubectl get pdb -n istio-systemshowsALLOWED DISRUPTIONS >= 1for any gateway PDBPodEvictionFailureRelated