Skip to content

feat: opt-in ALB autocreation when pool capacity is exhausted#187

Merged
fedemaleh merged 15 commits into
betafrom
feature/clien-807-autocreate-alb
Jun 16, 2026
Merged

feat: opt-in ALB autocreation when pool capacity is exhausted#187
fedemaleh merged 15 commits into
betafrom
feature/clien-807-autocreate-alb

Conversation

@fedemaleh

Copy link
Copy Markdown
Collaborator

Summary

Adds opt-in ALB autocreation for k8s scopes (CLIEN-807). When every declared ALB is at or above ALB_MAX_CAPACITY, the platform provisions a new ALB via a dummy Ingress, tags it for future discovery, and uses it for the scope being created. Disabled by default.

Client proposal driving this change: .claude/client-docs/autocreate-alb-spin.docx.

Design notes (worth a comment if you disagree)

  • Where it runs: inside resolve_balancer, after the existing least-loaded selection. When the picked ALB's rule count is >= ALB_MAX_CAPACITY and ALB_AUTOCREATE_ENABLED=true, the script sources autocreate_alb and substitutes the new ALB name. Keeps the selection logic in a single file.
  • Registration without nullplatform API: the new ALB is registered by AWS tag (nullplatform:managed-by=autocreate + nullplatform:visibility=...) rather than by calling the nullplatform API to update the provider config. The repo has no existing pattern for calling the platform API from workflow scripts, and tag-based discovery makes the change self-contained. Subsequent scope creations discover the ALB via resourcegroupstaggingapi:GetResources and treat it as a normal candidate. Happy to switch to an API write if you prefer; flagged as an open question in the spec.
  • Naming: <prefix><public|private>-<6 hex chars>, prefix capped at 18 chars so the total stays under AWS's 32-char ALB limit. Prefix validated as ^[a-z0-9-]+$ to keep the rendered Ingress + tag values safe.
  • Wait loop: single describe-load-balancers --output json call per 10-second poll, parses ARN and state with jq. Timeout default 300s; rejects 0 and non-positive values.

Known limitations (deliberate, not in this PR)

  • Race: two concurrent scope creations that both observe a full pool will each autocreate. Both ALBs get tagged and become candidates afterwards, so no correctness issue — one extra ALB. Solving this requires distributed coordination (k8s lease / DynamoDB), out of scope for the bash layer.
  • Retry idempotency: if build_context is retried after a transient failure, autocreate_alb re-runs with a new random name and provisions another ALB. The previous one stays tagged and reusable.
  • No automatic cleanup of unused autocreated ALBs.
  • No IaC reconciliation — the new ALB is not added to the customer's Terraform/Tofu state. Documented explicitly in the proposal and in k8s/docs/autocreate-alb.md.

Files

  • New: k8s/scope/networking/autocreate_alb, k8s/scope/templates/ingress-dummy.yaml.tpl, k8s/scope/tests/networking/autocreate_alb.bats, k8s/docs/autocreate-alb.md.
  • Modified: k8s/scope/networking/resolve_balancer (discovery + autocreate fallback), k8s/scope/tests/networking/resolve_balancer.bats (5 new tests + a shared mock helper), k8s/values.yaml (3 new knobs), CHANGELOG.md.

Configuration

Key Default Description
ALB_AUTOCREATE_ENABLED false Master switch
ALB_AUTOCREATE_NAME_PREFIX nullplatform-auto- Prefix for the new ALB
ALB_AUTOCREATE_TIMEOUT_SECONDS 300 Poll deadline for state=active

New IAM permissions

  • elasticloadbalancing:AddTags
  • elasticloadbalancing:DescribeTags
  • tag:GetResources

Test plan

Automated (already passing locally):

  • bats k8s/scope/tests/networking/autocreate_alb.bats — 15 tests (name generation, ingress apply, polling, tagging, prefix + timeout validation)
  • bats k8s/scope/tests/networking/resolve_balancer.bats — 35 tests (existing 27 plus 8 new ones covering tag discovery, autocreate trigger paths, MAX_CAPACITY validation)
  • Full k8s/scope/tests regression run — 226 ok, 1 pre-existing flaky unrelated test (wait_on_balancer: external_dns success after retries, already failing on beta)

Manual (your environment):

  • Habilitar ALB_AUTOCREATE_ENABLED=true y crear scopes con pool sano → debe usar uno existente, no autocrear
  • Saturar todos los ALBs declarados sobre el umbral y crear un scope → debe autocrear, taguear y usar el nuevo
  • Crear un scope siguiente → debe descubrir el ALB autocreado y usarlo (sin autocrear otro)
  • Verificar que con ALB_AUTOCREATE_ENABLED=false la creación falla con el mensaje existente cuando el pool está lleno

Refs

Comment thread k8s/scope/networking/autocreate_alb Outdated
Comment thread k8s/scope/networking/autocreate_alb Outdated
Comment thread k8s/scope/networking/autocreate_alb Outdated
Comment thread k8s/scope/networking/autocreate_alb Outdated
Comment thread k8s/scope/networking/autocreate_alb Outdated
Comment thread k8s/scope/networking/autocreate_alb Outdated
Comment thread k8s/scope/networking/autocreate_alb
Comment thread k8s/scope/networking/autocreate_alb Outdated
Comment thread k8s/scope/networking/resolve_balancer Outdated
Comment thread k8s/scope/templates/ingress-dummy.yaml.tpl
Comment thread k8s/scope/networking/resolve_balancer
Comment thread k8s/scope/build_context
Comment thread k8s/scope/tests/networking/wait_for_alb.bats Outdated
Comment thread k8s/scope/tests/networking/resolve_balancer.bats Outdated
Comment thread k8s/scope/networking/wait_for_alb
Comment thread k8s/docs/autocreate-alb.md Outdated
Comment thread k8s/scope/networking/autocreate_alb
Comment thread k8s/scope/networking/wait_for_alb
Comment thread k8s/scope/networking/resolve_balancer Outdated
sebasnallar
sebasnallar previously approved these changes Jun 16, 2026
ignacioboud
ignacioboud previously approved these changes Jun 16, 2026
@fedemaleh fedemaleh dismissed stale reviews from ignacioboud and sebasnallar via 47c98c2 June 16, 2026 15:42
@fedemaleh fedemaleh merged commit 056f3f2 into beta Jun 16, 2026
3 checks passed
@fedemaleh fedemaleh deleted the feature/clien-807-autocreate-alb branch June 16, 2026 16:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants