Skip to content

Zombie process leak in repo sync — git children not reaped, eventually exhausts fork() (EAGAIN "Resource temporarily unavailable") #53

@FrozenRaspberry

Description

@FrozenRaspberry

Summary

A node deployed via the Docker Compose quickstart accumulates zombie (defunct) child processes over time until the container hits its PID/thread limit. Once that happens, every new git subprocess fails to fork and repo sync is permanently broken until the container is restarted.

Environment

  • Image built from this repo's Docker Compose quickstart (docker compose up -d: node + postgres:16-alpine), built from main around late May 2026
  • Host: 2 GB RAM Debian/Ubuntu VPS
  • Uptime when observed: ~3 weeks

Symptoms

After ~3 weeks of uptime, docker stats showed the node container at ~2338 PIDs, and host top reported 2334 zombie processes, all parented by the gitlawb-node process.

Repo sync was failing continuously:

WARN gitlawb_node::sync: repo sync failed repo=z6Mk.../solady-n23 origin=https://node.gitlawb.com
  err=git clone --mirror failed: Cloning into bare repository '/data/repos/.../solady-n23.git'...
  error: cannot fork() for remote-https: Resource temporarily unavailable

The cannot fork() ... Resource temporarily unavailable (EAGAIN) is a downstream effect of the process table being exhausted by the accumulated zombies.

Likely root cause

The node spawns git child processes (git clone --mirror for sync, presumably also git-upload-pack / git-receive-pack for smart-HTTP) but never wait()s / reaps them. Their exit statuses are never collected, so they linger as zombies. Over time this fills the PID/thread table, after which any fork()/clone() returns EAGAIN and all git-spawning operations (sync, clone, push) fail.

Impact

  • Not self-healing: once the limit is hit, the node can no longer fork git, so sync stays broken until a manual restart.
  • A long-running node silently degrades and stops replicating, with no obvious signal other than the repeating fork errors.

Suggested fix

  • Always await/reap every spawned child (e.g. .wait() the std::process::Child, or tokio::process::Command with an awaited Child; don't drop Child handles without waiting).
  • As a container-level mitigation, run with an init that reaps orphans (docker run --init / tini, or init: true in compose) — but the real fix is reaping in-process.
  • Optionally cap sync concurrency and surface a clear error/metric when a spawn fails with EAGAIN.

Workaround

Restart the container (zombies are reaped on restart).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions