fix(traefik): finite readTimeout on entrypoints to stop fd-leak unreachability#108
Merged
Conversation
Traefik's static config set no connection timeouts, inheriting v2.x
defaults of readTimeout=0 / writeTimeout=0 ("no timeout"). On a shard's
public IP, internet scanners constantly open connections that never
complete a request (silent connects, abandoned TLS handshakes,
slowloris). With readTimeout=0 these are held open forever, each
consuming a file descriptor, until traefik hits its open-file ceiling
and accept() fails with EMFILE -- at which point the shard is unreachable
and the error is logged in a hot loop (the 38 GB log that filled root in
the tdyz60/#306 incident).
Set readTimeout=300s on the http and https entrypoints so abandoned
connections are reaped. 300s stays generous for slow/large uploads;
writeTimeout is intentionally left at default 0 so large downloads, SSE,
and long-poll responses are not cut off. The mqtt (8883) entrypoint is
TCP, where respondingTimeouts do not apply -- that vector is covered by a
nofile ulimit on the traefik service (separate controller PR).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Follow-up to the #306 / shard tdyz60 incident (root-cause for the unreachability, which #309's log rotation does not address).
Traefik's static config (
data/traefik.yml,data/traefik_no_ssl.yml) sets no connection timeouts, so it inherits traefik's defaults:readTimeout=0,writeTimeout=0— both meaning "no timeout".On a shard's public IP with 80/443/8883 open to the internet, background scanners (Shodan/Censys/masscan/exploit bots) continuously open TCP connections that never complete a request — silent connects, abandoned TLS handshakes, slowloris. With
readTimeout=0each is held open forever, consuming one file descriptor. They accumulate over days until traefik hits its open-file ceiling andaccept()returns EMFILE (too many open files):restart: alwaysnever fires);Change
Set
readTimeout: "300s"viatransport.respondingTimeoutson the http and https entrypoints in both static configs. Abandoned connections are now reaped instead of leaking fds.300sstays generous for slow/large uploads (personal-cloud file sync).writeTimeoutis intentionally left at default 0 so large downloads, SSE, and long-poll responses are not cut off.idleTimeoutunchanged (default 180s).Scope / what this does NOT cover
:8883) is a TCP entrypoint;respondingTimeoutsare HTTP-only and do not apply. That fd vector is handled by anofileulimit on the traefik service in the companion controller PR (defense-in-depth; raises the ceiling so any residual leak degrades gracefully instead of bricking).Test plan
data/traefik.ymlanddata/traefik_no_ssl.ymlparse as valid YAML;entryPoints.{http,https}.transport.respondingTimeouts.readTimeoutpresent,mqttuntouched._copy_traefik_static_config()renders the file with Jinja2 (only{{ acme_email }}); added keys contain no template syntax, so rendering is unaffected.Recommended reading order
data/traefik.ymldata/traefik_no_ssl.yml🤖 Generated with Claude Code