Skip to content

fix(plugin): detach daemon from job control + add --max-time to hook curls#457

Open
ANGELES00004 wants to merge 1 commit into
Gentleman-Programming:mainfrom
ANGELES00004:fix/daemon-job-control-and-hook-curl-timeouts
Open

fix(plugin): detach daemon from job control + add --max-time to hook curls#457
ANGELES00004 wants to merge 1 commit into
Gentleman-Programming:mainfrom
ANGELES00004:fix/daemon-job-control-and-hook-curl-timeouts

Conversation

@ANGELES00004
Copy link
Copy Markdown

Problem

On Linux/WSL2 the engram daemon can end up suspended (State: T (stopped)), keeping port 7437 bound but never answering. Every Claude Code session close then hangs and leaks a process, and these accumulate over time.

In my environment I found dozens of stuck processes — one bash + one curl per session close, oldest ~1h36m:

$ ps -o pid,etime,stat,cmd ...
  PID   ELAPSED STAT CMD
10949  01:45:52 Tl   engram serve                                              # daemon SUSPENDED
35119  01:36:15 S    curl -sf http://127.0.0.1:7437/sessions/<id>/end -X POST  # hung forever
 ...
$ ps -o stat,wchan -p 10949
STAT  WCHAN
Tl    do_signal_stop          # stopped by a job-control signal

$ ss -ltnp | grep 7437
LISTEN 41 4096 127.0.0.1:7437 users:(("engram",pid=10949,fd=9))   # 41 conns queued, never accepted

$ curl -m4 http://127.0.0.1:7437/health
000   (exit 28 — timeout; daemon owns the port but never responds)

gentle-ai doctor flagged engram:reachable as unhealthy because of this.

Root cause

Two compounding issues in the Claude Code plugin hooks:

  1. session-start.sh starts the daemon with a bare &:

    engram serve &>/dev/null &

    This leaves it in the parent shell's process group, attached to the controlling terminal (TT: pts/N). A Ctrl-Z (SIGTSTP), or the terminal being suspended/closed, then delivers SIGSTOP and the daemon freezes in state T while still owning the port.

  2. session-stop.sh's curl has no --max-time:

    curl -sf "${ENGRAM_URL}/sessions/${SESSION_ID}/end" ...

    Against a frozen daemon the TCP handshake completes (kernel backlog) but no HTTP response ever arrives, so the curl hangs forever. The hook's "timeout": 5 + "async": true kills the bash wrapper, but the curl child is reparented to init and survives → one leaked process per session close.

Fix

  • session-start.sh — launch the daemon with setsid (fallback nohup) in its own session with no controlling terminal, so SIGTSTP/SIGHUP from a terminal can no longer suspend it. This is the portable equivalent of running it as a service.
  • session-stop.sh, session-start.sh, post-compaction.sh — add --max-time 3 to the remaining hook curls so an unreachable/unresponsive daemon fails fast instead of hanging and leaking processes. The other curls in these scripts already used --max-time; this just makes the rest consistent.

bash -n passes on all three scripts.

Honest disclaimer

I'm not 100% sure this is the ideal fix — it's how the issue got diagnosed and resolved with the help of Claude Code in a real WSL2 environment. In my own setup I additionally moved the daemon to a systemd --user service to recover immediately, but that isn't portable so it's not proposed here. Feedback very welcome, and happy to adjust the approach — for example, daemonizing inside the engram serve binary itself (double-fork + setsid) might be a more robust long-term fix than patching the shell hooks.

…-max-time

The SessionStart hook starts `engram serve` with a bare `&`, leaving the
daemon in the parent shell's process group and attached to its controlling
terminal. On Linux/WSL2 a Ctrl-Z (SIGTSTP) or terminal close can suspend the
daemon into state T: it keeps port 7437 bound but stops answering. Since the
Stop hook's curl has no --max-time, every session close then hangs forever and
leaks a bash+curl process, accumulating one per stop.

- session-start.sh: launch the daemon via setsid (fallback nohup) in its own
  session with no controlling terminal, immune to SIGTSTP/SIGHUP.
- session-stop.sh, session-start.sh, post-compaction.sh: add --max-time 3 to
  the remaining hook curls so an unresponsive daemon fails fast instead of
  hanging and leaking processes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant