Skip to content

CoreDNS - spread replicas across nodes to survive control plane reboots #39

@letan-assistant

Description

@letan-assistant

Both CoreDNS replicas currently land on the control plane node. When it reboots, cluster DNS goes down entirely, causing cascading failures (e.g., miniflux unable to reach postgres via service name).

The cluster-proportional-autoscaler (added in c2e6318) ensures 2 replicas exist, but doesn't influence scheduling. A podAntiAffinity or topologySpreadConstraints is needed to guarantee replicas land on different nodes.

Complication

k3s manages CoreDNS as a built-in AddOn (not a HelmChart), so it re-applies its own coredns.yaml on every boot — overwriting any manual Deployment patches. Options:

  1. Post-boot systemd unit that patches the CoreDNS Deployment with a topologySpreadConstraints after k3s starts
  2. Disable built-in CoreDNS (--disable coredns) and manage it as a custom HelmChart with full control over scheduling
  3. Kustomize-style overlay if k3s supports it (needs research)

Option 2 is the cleanest long-term but more work. Option 1 is a quick fix.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions