Skip to content

fix(daemon): prevent self-PID detection causing immediate exit#1467

Open
AleksandarStaikov wants to merge 1 commit intoruvnet:mainfrom
AleksandarStaikov:fix/daemon-self-pid-detection
Open

fix(daemon): prevent self-PID detection causing immediate exit#1467
AleksandarStaikov wants to merge 1 commit intoruvnet:mainfrom
AleksandarStaikov:fix/daemon-self-pid-detection

Conversation

@AleksandarStaikov
Copy link
Copy Markdown

@AleksandarStaikov AleksandarStaikov commented Mar 27, 2026

Problem

daemon start reports success but daemon status immediately shows STOPPED.
The daemon process exits silently with code 0. Affects Linux, macOS, and Windows.
Reported in #1039, symptoms also visible in #984.

Root Cause

The foreground command writes daemon.pid with the current process PID before
calling startDaemon(). When WorkerDaemon.start() runs checkExistingDaemon(),
it reads that file, finds the current process's own PID, calls process.kill(pid, 0)
on itself (which always succeeds), and concludes a daemon is already running —
returning early without scheduling any workers.

With no timers scheduled, the Node.js event loop has nothing to do and the process
exits with code 0. The daemon never ran.

Fix (belt and suspenders)

  1. Root cause (daemon.ts): Remove the premature daemon.pid write.
    WorkerDaemon.writePidFile() inside start() is the correct and only writer.
    The process.on('exit', cleanup) handler remains as a crash safety net.

  2. Defensive guard (worker-daemon.ts): In checkExistingDaemon(), skip
    the conflict check when the stored PID equals process.pid. A process cannot
    conflict with itself.

Testing

Verified on Ubuntu 24.04 via SSH. After fix:

  • daemon start → process stays alive (confirmed via ps aux)
  • daemon status → shows ● RUNNING (background)
  • Workers begin executing (map worker ran within 10s)

The foreground command wrote daemon.pid with its own PID before calling
startDaemon(). WorkerDaemon.start() then called checkExistingDaemon(),
which read that file, found the current process PID, called
process.kill(pid, 0) on itself (always succeeds), and concluded a daemon
was already running - skipping all worker scheduling entirely.

With no timers scheduled, the Node.js event loop drained and the process
exited silently with code 0. This caused daemon status to always show
STOPPED immediately after start, on Linux, macOS, and Windows alike.

Fix (two-part, belt and suspenders):

1. Remove the premature daemon.pid write from the foreground command.
   WorkerDaemon.start() via writePidFile() is the sole owner of this
   file. The cleanup handler still works as a safety net for crashes.

2. Add a self-PID guard in checkExistingDaemon(): if the stored PID
   equals process.pid, return null - we cannot conflict with ourselves.
   This defends against any future race where the file is written early.

Fixes symptoms reported in ruvnet#1039. Related to ruvnet#984.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant