Skip to content

Daemon-spawned right-pane shell leaks NODE_CHANNEL_FD into droid CLI, breaks session creation #1163

@hdot123

Description

@hdot123

Summary

When droid is launched in the Factory desktop app's embedded right-pane terminal, it consistently fails at session creation with:

[useDaemonAgent] Failed to create session
ConnectionClosedError: Connection closed while request pending (code=1000, reason=IPC disconnected)

User-visible message in the TUI: 创建会话失败。请查看日志了解详情 ("Failed to create session, see logs for details").

The same droid binary (v0.135.0, ~/.local/bin/droid and Factory.app/Contents/Resources/bin/droid are identical) works perfectly when launched from any standalone terminal (Terminal.app, iTerm, Warp, Zed, etc.).

Root cause

The Factory desktop daemon (droid daemon --listen ipc) is spawned by the Electron main process with Node's stdio: 'ipc' channel, which gives the daemon process two inherited env vars:

NODE_CHANNEL_FD=3
NODE_CHANNEL_SERIALIZATION_MODE=json

(fd 3 is the Node IPC channel between Electron main and the daemon.)

The daemon then directly fork-execs /bin/zsh to host the embedded right-pane terminal (verified via ps — the pane shell's PPID is the daemon, not the Factory main process). Because the daemon does not strip these two env vars before spawning the shell, they leak into the shell, and then into anything launched from it.

When the user types droid in that pane:

  1. The Node-based droid CLI starts and inherits NODE_CHANNEL_FD=3
  2. Node's startup detects the env var and automatically opens fd 3 as an IPC channel to a "parent" Node process
  3. But fd 3 in the droid process is either closed or points to Electron's IPC channel, which is not accessible from this descendant
  4. Droid's IpcDaemonClientTransport.sendWhenAvailable (src/transports/IpcDaemonClientTransport.ts:224) hits an immediate MetaError: IPC transport is not connected
  5. This bubbles up as ConnectionClosedError code=1000 reason=IPC disconnected → "Failed to create session"

The failure is deterministic, reproducible on every launch, and happens within ~50 ms of the "Connected and authenticated" log line, well before the SessionStart hook (which is async) finishes.

Reproduction

  1. macOS, Factory.app desktop v0.92.0, droid CLI v0.135.0
  2. Open the embedded right-pane terminal in Factory.app
  3. In that pane, run droid (or cd <any project> && droid)
  4. Observe immediate "创建会话失败 / Failed to create session" error in the TUI

Compare with: open Terminal.app / iTerm / Warp, run the same droid from the same ~/.local/bin/droid binary — works fine.

Diagnostic comparison

Failed in-app droid process env (relevant subset):

NODE_CHANNEL_FD=3
NODE_CHANNEL_SERIALIZATION_MODE=json
FACTORY_DESKTOP_CDP_PORT=<port>
FACTORY_DESKTOP_CDP_TOKEN=<token>
FACTORY_UPSTREAM_CLIENT_TYPE=web-desktop

Working standalone droid process env (relevant subset):

(NODE_CHANNEL_FD not set)
(NODE_CHANNEL_SERIALIZATION_MODE not set)
(no FACTORY_DESKTOP_* vars)

Process tree (failed):

Factory.app (main)
└─ droid daemon --listen ipc          ← inherits NODE_CHANNEL_FD=3 from Electron
   └─ /bin/zsh (embedded right pane)  ← inherits NODE_CHANNEL_FD=3 from daemon
      └─ droid                        ← inherits NODE_CHANNEL_FD=3 → crash

Process tree (working):

Terminal.app
└─ -zsh
   └─ droid                            ← clean env, works

Relevant log excerpt

[2026-05-28T04:50:50.618Z] INFO: [TuiDaemonAdapter] Opening in-process connection
[2026-05-28T04:50:50.685Z] INFO: [DaemonSessionController] Initial connection established
[2026-05-28T04:50:50.686Z] INFO: Trusted daemon IPC connection using inherited auth
[2026-05-28T04:50:50.686Z] INFO: [TuiDaemonAdapter] Connected and authenticated
[2026-05-28T04:50:50.816Z] INFO: [Session] Saving session settings
[2026-05-28T04:50:50.818Z] INFO: [Hooks] Matched commands SessionStart
[2026-05-28T04:50:50.833Z] INFO: [FileIndexer] Crawl completed (24ms)
[2026-05-28T04:50:50.867Z] WARN: [DaemonClientIPC] IPC transport emitted error
  MetaError: IPC transport is not connected
  at sendWhenAvailable (../../packages/daemon-client/src/transports/IpcDaemonClientTransport.ts:224:19)
[2026-05-28T04:50:50.868Z] WARN: [DaemonSessionController] Session initialization failed
  ConnectionClosedError: Connection closed while request pending (code=1000, reason=IPC disconnected)
[2026-05-28T04:50:50.869Z] ERROR: [useDaemonAgent] Failed to create session

Workaround (user side)

Add to ~/.zshrc (or your shell's startup file):

if [ -n "${FACTORY_DESKTOP_CDP_PORT:-}" ] || [ -n "${FACTORY_UPSTREAM_CLIENT_TYPE:-}" ]; then
  unset NODE_CHANNEL_FD NODE_CHANNEL_SERIALIZATION_MODE
fi

Verified: after this, droid in the right-pane terminal starts normally.

For a one-shot test without modifying shell files:

unset NODE_CHANNEL_FD NODE_CHANNEL_SERIALIZATION_MODE && droid

Suggested fix (Factory side)

When the desktop daemon spawns the embedded terminal subshell, scrub Node IPC env vars from the child's environment before exec, e.g.:

spawn(shell, [...], {
  env: {
    ...sanitize(process.env, { remove: ['NODE_CHANNEL_FD', 'NODE_CHANNEL_SERIALIZATION_MODE'] }),
    // ... other env injections
  },
  // ...
});

This is the standard mitigation for the Node IPC fd-leak problem when an ipc-spawned Node process needs to fork unrelated child processes (it bites every Electron app that embeds a terminal — VS Code, Slack desktop, etc., all do this).

A grandchild-defensive approach inside the droid CLI is also worth considering: at startup, if NODE_CHANNEL_FD is set but the process was NOT actually forked by Node (e.g., parent is /bin/zsh), unset it before process.send gets wired up.

Environment

  • macOS 25.4.0 (darwin arm64)
  • Factory.app v0.92.0
  • droid CLI v0.135.0
  • Shell: zsh
  • BYOK custom models in use (so cloud routing is not on the critical path) — failure is independent of model selection
  • Factory user id: user_01KP2JH0AG6ECDYZ95805FGDGM
  • droidInstallationId: b01134b5-7c98-4a97-8850-a409c5b3f11c
  • factoryTier: team_annual

Notes

  • 100% reproducible, no race condition
  • Persists across desktop daemon restarts and full Factory.app restarts (because the bug is in how the daemon spawns the shell, not in daemon state)
  • Independent of session model, autonomy mode, custom hooks, or project cwd
  • Independent of SessionStart hook output — the IPC disconnect fires before the hook completes

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions