Both CoreDNS replicas currently land on the control plane node. When it reboots, cluster DNS goes down entirely, causing cascading failures (e.g., miniflux unable to reach postgres via service name).
The cluster-proportional-autoscaler (added in c2e6318) ensures 2 replicas exist, but doesn't influence scheduling. A podAntiAffinity or topologySpreadConstraints is needed to guarantee replicas land on different nodes.
Complication
k3s manages CoreDNS as a built-in AddOn (not a HelmChart), so it re-applies its own coredns.yaml on every boot — overwriting any manual Deployment patches. Options:
- Post-boot systemd unit that patches the CoreDNS Deployment with a
topologySpreadConstraints after k3s starts
- Disable built-in CoreDNS (
--disable coredns) and manage it as a custom HelmChart with full control over scheduling
- Kustomize-style overlay if k3s supports it (needs research)
Option 2 is the cleanest long-term but more work. Option 1 is a quick fix.
Both CoreDNS replicas currently land on the control plane node. When it reboots, cluster DNS goes down entirely, causing cascading failures (e.g., miniflux unable to reach postgres via service name).
The cluster-proportional-autoscaler (added in c2e6318) ensures 2 replicas exist, but doesn't influence scheduling. A
podAntiAffinityortopologySpreadConstraintsis needed to guarantee replicas land on different nodes.Complication
k3s manages CoreDNS as a built-in AddOn (not a HelmChart), so it re-applies its own
coredns.yamlon every boot — overwriting any manual Deployment patches. Options:topologySpreadConstraintsafter k3s starts--disable coredns) and manage it as a custom HelmChart with full control over schedulingOption 2 is the cleanest long-term but more work. Option 1 is a quick fix.