Summary
A node deployed via the Docker Compose quickstart accumulates zombie (defunct) child processes over time until the container hits its PID/thread limit. Once that happens, every new git subprocess fails to fork and repo sync is permanently broken until the container is restarted.
Environment
- Image built from this repo's Docker Compose quickstart (
docker compose up -d: node + postgres:16-alpine), built from main around late May 2026
- Host: 2 GB RAM Debian/Ubuntu VPS
- Uptime when observed: ~3 weeks
Symptoms
After ~3 weeks of uptime, docker stats showed the node container at ~2338 PIDs, and host top reported 2334 zombie processes, all parented by the gitlawb-node process.
Repo sync was failing continuously:
WARN gitlawb_node::sync: repo sync failed repo=z6Mk.../solady-n23 origin=https://node.gitlawb.com
err=git clone --mirror failed: Cloning into bare repository '/data/repos/.../solady-n23.git'...
error: cannot fork() for remote-https: Resource temporarily unavailable
The cannot fork() ... Resource temporarily unavailable (EAGAIN) is a downstream effect of the process table being exhausted by the accumulated zombies.
Likely root cause
The node spawns git child processes (git clone --mirror for sync, presumably also git-upload-pack / git-receive-pack for smart-HTTP) but never wait()s / reaps them. Their exit statuses are never collected, so they linger as zombies. Over time this fills the PID/thread table, after which any fork()/clone() returns EAGAIN and all git-spawning operations (sync, clone, push) fail.
Impact
- Not self-healing: once the limit is hit, the node can no longer fork git, so sync stays broken until a manual restart.
- A long-running node silently degrades and stops replicating, with no obvious signal other than the repeating fork errors.
Suggested fix
- Always await/reap every spawned child (e.g.
.wait() the std::process::Child, or tokio::process::Command with an awaited Child; don't drop Child handles without waiting).
- As a container-level mitigation, run with an init that reaps orphans (
docker run --init / tini, or init: true in compose) — but the real fix is reaping in-process.
- Optionally cap sync concurrency and surface a clear error/metric when a spawn fails with EAGAIN.
Workaround
Restart the container (zombies are reaped on restart).
Summary
A node deployed via the Docker Compose quickstart accumulates zombie (defunct) child processes over time until the container hits its PID/thread limit. Once that happens, every new
gitsubprocess fails to fork and repo sync is permanently broken until the container is restarted.Environment
docker compose up -d: node +postgres:16-alpine), built frommainaround late May 2026Symptoms
After ~3 weeks of uptime,
docker statsshowed the node container at ~2338 PIDs, and hosttopreported 2334 zombie processes, all parented by thegitlawb-nodeprocess.Repo sync was failing continuously:
The
cannot fork() ... Resource temporarily unavailable(EAGAIN) is a downstream effect of the process table being exhausted by the accumulated zombies.Likely root cause
The node spawns
gitchild processes (git clone --mirrorfor sync, presumably alsogit-upload-pack/git-receive-packfor smart-HTTP) but neverwait()s / reaps them. Their exit statuses are never collected, so they linger as zombies. Over time this fills the PID/thread table, after which anyfork()/clone()returns EAGAIN and all git-spawning operations (sync, clone, push) fail.Impact
Suggested fix
.wait()thestd::process::Child, ortokio::process::Commandwith an awaitedChild; don't dropChildhandles without waiting).docker run --init/tini, orinit: truein compose) — but the real fix is reaping in-process.Workaround
Restart the container (zombies are reaped on restart).