Skip to content

feat(istio): expose istio_ingressgateway_replicas to guarantee HA for node drains#379

Open
fedemaleh wants to merge 2 commits into
mainfrom
feature/istio-ingressgateway-ha-replicas
Open

feat(istio): expose istio_ingressgateway_replicas to guarantee HA for node drains#379
fedemaleh wants to merge 2 commits into
mainfrom
feature/istio-ingressgateway-ha-replicas

Conversation

@fedemaleh

Copy link
Copy Markdown
Contributor

Summary

Mirrors the istiod fix from #292 onto istio-ingressgateway. Single-replica + chart-default PDB blocks every node rolling update with PodEvictionFailure.

Problem

The Istio gateway Helm chart ships with:

  • autoscaling.enabled: true, autoscaling.minReplicas: 1
  • A default PodDisruptionBudget with minAvailable: 1

With a single replica and minAvailable: 1, the PDB resolves to ALLOWED DISRUPTIONS: 0 — every drain attempt against the node hosting the gateway pod fails. Identical class of bug as the single-replica istiod issue addressed in #292.

Change

Adds var.istio_ingressgateway_replicas (default 2, matching the istiod precedent and HA-by-default posture) and wires it into both:

  • replicaCount — the initial deployment replica count
  • autoscaling.minReplicas — the HPA floor

Without overriding both, the HPA scales back to 1 shortly after install and the deadlock returns. Same root cause as #292.

Validation

  • Verified with tofu plan against the existing galicia setup; produces an in-place update on the helm_release.istio_ingressgateway that adds the two set entries
  • Plan does not destroy the existing gateway Deployment; Helm rolls the change via standard upgrade
  • Default of 2 is consistent with feat(istio): expose istiod_replicas to guarantee HA for node drains #292; consumers wanting the previous single-replica behavior can opt out via istio_ingressgateway_replicas = 1

Test plan

  • Apply on a test cluster with default istio_ingressgateway_replicas = 2
  • Verify kubectl get deploy -n istio-system istio-ingressgateway shows 2/2
  • Verify kubectl get hpa -n istio-system istio-ingressgateway shows MINPODS: 2
  • Verify kubectl get pdb -n istio-system shows ALLOWED DISRUPTIONS >= 1 for any gateway PDB
  • Trigger a node group rolling update; confirm drain proceeds without PodEvictionFailure

Related

@fedemaleh fedemaleh marked this pull request as ready for review June 2, 2026 14:26

@sebastiancorrea81 sebastiancorrea81 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants