Problem
Traefik's ACME DNS-01 challenge consistently fails because lego's propagation check gets NXDOMAIN when querying Vultr's authoritative nameservers (ns1.vultr.com / ns2.vultr.com) from inside the cluster.
The _acme-challenge TXT record is created correctly via the Vultr API and resolves fine from outside the cluster, but returns NXDOMAIN when queried from pods.
Evidence
From outside the cluster:
$ dig @ns1.vultr.com _acme-challenge.elsa.soap.coffee TXT +short
"eo1WK15fbuTGbq_PWwhPkNzpsfXmtbrTQYigpPVcVBI"
From inside a pod (busybox):
$ nslookup -type=TXT _acme-challenge.elsa.soap.coffee ns1.vultr.com
Server: ns1.vultr.com
Address: 173.199.96.96:53
** server can't find _acme-challenge.elsa.soap.coffee: NXDOMAIN
Lego reports:
propagation: time limit exceeded: last error: authoritative nameservers:
NS ns2.vultr.com.:53 returned NXDOMAIN for _acme-challenge.elsa.soap.coffee.
Current workaround
Disabled authoritative NS checks in Traefik's ACME config and use external resolvers instead:
additionalArguments:
- "--certificatesresolvers.letsencrypt.acme.dnsChallenge.resolvers=1.1.1.1:53,8.8.8.8:53"
- "--certificatesresolvers.letsencrypt.acme.dnsChallenge.propagation.disableANSChecks=true"
TODO
Problem
Traefik's ACME DNS-01 challenge consistently fails because lego's propagation check gets NXDOMAIN when querying Vultr's authoritative nameservers (
ns1.vultr.com/ns2.vultr.com) from inside the cluster.The
_acme-challengeTXT record is created correctly via the Vultr API and resolves fine from outside the cluster, but returns NXDOMAIN when queried from pods.Evidence
From outside the cluster:
From inside a pod (busybox):
Lego reports:
Current workaround
Disabled authoritative NS checks in Traefik's ACME config and use external resolvers instead:
TODO
digfrom inside the cluster (busyboxnslookupis unreliable — debug pods withdigkept timing out during initial investigation)disableANSChecksis acceptable long-term or if a better fix exists