Why this is the critical path
Klonode's whole point is to be a workstation where Claude can improve the workstation itself. That only works if Claude can edit Klonode's own source files without killing the conversation doing the editing. Today it doesn't:
- Claude edits
packages/ui/src/lib/components/ChatPanel/ChatPanel.svelte → Vite HMR re-mounts the component → the in-flight SSE reader in the browser is torn down → the stream dies → the conversation is lost and has to restart from scratch.
- Claude edits
packages/ui/src/routes/api/chat/stream/+server.ts → Vite restarts the dev server → the spawned Claude CLI subprocess is killed mid-turn → the SSE stream errors out → Klonode renders "Feil: network error".
- Claude edits a store under
packages/ui/src/lib/stores/ → store re-exports invalidate all importers → full page reload → location.reload() severs the fetch.
Every path by which Klonode could improve its own UI dies at the first Edit tool call.
Workarounds we've tried and why they don't solve it:
The only real fix is: the Claude CLI process must outlive Vite. That means running it as a detached worker the dev server can't kill.
Architecture
Core idea
Move the spawn('claude.exe', [...]) out of the request-handler lifecycle and into a long-lived worker managed by Klonode's backend. The worker:
- Lives in its own Node process (or a
pm2 / forever / Windows Service handle)
- Has its own stdout/stderr captured to an append-only log file under
.klonode/workers/<worker-id>.log
- Exposes an IPC channel (named pipe on Windows, unix socket on Mac/Linux, or a localhost-only HTTP endpoint with a random token)
- Persists its state (active Claude session, last message, streaming position) to
.klonode/workers/<worker-id>.state.json
The Klonode Workstation UI connects to the worker over the IPC channel instead of spawning Claude directly. When Vite HMRs the browser tab, the new browser reconnects to the same worker by ID and gets a delta of events emitted since the last byte offset the old connection acknowledged.
Minimum viable worker protocol
POST /worker/spawn { repoPath, cwd, systemPrompt, prompt } → { workerId }
GET /worker/:id/stream?since=<byte-offset> → SSE of events from offset
POST /worker/:id/stop → aborts (SIGTERM the child)
GET /worker/:id/status → { alive, lastActivity, tokensConsumed }
The worker process keeps a ring buffer of recent events in memory AND appends every event to the disk log. On reconnect the Workstation tails the log from the last offset the browser confirmed.
Surviving HMR
When Vite HMRs ChatPanel.svelte:
- Svelte remounts the component
onMount runs, checks sessionsStore for an active workerId
- If present, opens an EventSource to
GET /worker/:id/stream?since=<offset>
- Events resume streaming from where they left off — no tokens lost, no tool calls retriggered
When Vite restarts the whole dev server (because a store or API route changed):
- Server process exits — but the detached worker is NOT a child of the Vite process, so it survives
- Browser reloads, reconnects by
workerId
- Same resume path
When the detached worker itself crashes (Claude CLI bug, OOM, etc):
- Worker manager detects SIGCHLD / exit
- State file is marked
crashed: true with last-known-good byte offset
- Browser shows a recoverable error banner with a "resume from last checkpoint" button
Why a separate Node process (not a worker_thread)
Worker threads die with the Vite parent. Child processes spawned by the Vite handler die with the Vite parent (which is why today's code has this problem). The only thing that survives is a process spawned with detached: true AND stdio: 'ignore' AND its PID recorded somewhere durable. That has to be a full standalone Node process with its own module graph, not a thread sharing the SvelteKit runtime.
Windows specifics
- Use
detached: true, windowsHide: true, stdio: ['ignore', outputFileFd, outputFileFd]
- Open a named pipe with
net.createServer on \\\\.\\pipe\\klonode-worker-<id>
- Register the PID in
.klonode/workers/<id>.pid for cleanup on next boot
Files to touch
New:
packages/ui/src/lib/workers/worker-client.ts — browser-side client that spawns, streams, reconnects
packages/ui/src/lib/workers/worker-manager.ts — server-side worker registry / spawner
packages/ui/src/lib/workers/worker-protocol.ts — shared types for the IPC events
packages/ui/src/routes/api/worker/spawn/+server.ts — POST endpoint
packages/ui/src/routes/api/worker/[id]/stream/+server.ts — GET SSE with ?since= offset
packages/ui/src/routes/api/worker/[id]/stop/+server.ts — POST abort
packages/worker/ — NEW package. A standalone Node binary klonode-worker that wraps the Claude CLI spawn, writes the log, handles the IPC, and persists state. Built with tsup like the existing CLI package.
docs/self-hosting.md — update to describe the detached-worker flow
Modified:
packages/ui/src/routes/api/chat/stream/+server.ts — either delegate to the worker manager or keep as a legacy path for non-self-hosting setups with a warning
packages/ui/src/lib/components/ChatPanel/ChatPanel.svelte — replace direct fetch to /api/chat/stream with workerClient.spawn(...) + reconnect logic in onMount
packages/ui/src/lib/stores/agents.ts — cliSessionIds becomes workerIds: Record<tabId, workerId>; persisted across reloads as already done for the old session IDs
Acceptance criteria
Out of scope for the first PR
- Multi-worker orchestration beyond one worker per tab
- Worker sharing across windows / users
- Authentication beyond "localhost-only + random token"
- GUI for inspecting worker state
Why this is help-wanted
The IPC transport, log-tailing, and reconnect protocol are well-understood problems with good reference implementations (pm2, forever, tmux, GotTY). The Klonode-specific part is narrow: wrap the existing Claude CLI spawn in a persistent wrapper and make the ChatPanel reconnect by worker ID instead of spawning fresh. A contributor who has shipped a Node process supervisor before could land the minimum viable version in a weekend.
Sibling issues:
Sibling PRs:
Why this is the critical path
Klonode's whole point is to be a workstation where Claude can improve the workstation itself. That only works if Claude can edit Klonode's own source files without killing the conversation doing the editing. Today it doesn't:
packages/ui/src/lib/components/ChatPanel/ChatPanel.svelte→ Vite HMR re-mounts the component → the in-flight SSE reader in the browser is torn down → the stream dies → the conversation is lost and has to restart from scratch.packages/ui/src/routes/api/chat/stream/+server.ts→ Vite restarts the dev server → the spawned Claude CLI subprocess is killed mid-turn → the SSE stream errors out → Klonode renders "Feil: network error".packages/ui/src/lib/stores/→ store re-exports invalidate all importers → full page reload →location.reload()severs the fetch.Every path by which Klonode could improve its own UI dies at the first
Edittool call.Workarounds we've tried and why they don't solve it:
chatStore.messagesandcliSessionIdsto localStorage (PR Self-hosting survival: persist chat + CLI session IDs across reloads #66) — survives the reload but loses the in-flight tokens and interrupts whatever file operation was mid-flight--resumeand always re-spawning (PR Fix: raise chat max-turns to 500 and drop broken --resume #70) — solves a different bug but makes every edit cost a cold restart of the conversationThe only real fix is: the Claude CLI process must outlive Vite. That means running it as a detached worker the dev server can't kill.
Architecture
Core idea
Move the
spawn('claude.exe', [...])out of the request-handler lifecycle and into a long-lived worker managed by Klonode's backend. The worker:pm2/forever/ Windows Service handle).klonode/workers/<worker-id>.log.klonode/workers/<worker-id>.state.jsonThe Klonode Workstation UI connects to the worker over the IPC channel instead of spawning Claude directly. When Vite HMRs the browser tab, the new browser reconnects to the same worker by ID and gets a delta of events emitted since the last byte offset the old connection acknowledged.
Minimum viable worker protocol
The worker process keeps a ring buffer of recent events in memory AND appends every event to the disk log. On reconnect the Workstation tails the log from the last offset the browser confirmed.
Surviving HMR
When Vite HMRs ChatPanel.svelte:
onMountruns, checkssessionsStorefor an activeworkerIdGET /worker/:id/stream?since=<offset>When Vite restarts the whole dev server (because a store or API route changed):
workerIdWhen the detached worker itself crashes (Claude CLI bug, OOM, etc):
crashed: truewith last-known-good byte offsetWhy a separate Node process (not a worker_thread)
Worker threads die with the Vite parent. Child processes spawned by the Vite handler die with the Vite parent (which is why today's code has this problem). The only thing that survives is a process spawned with
detached: trueANDstdio: 'ignore'AND its PID recorded somewhere durable. That has to be a full standalone Node process with its own module graph, not a thread sharing the SvelteKit runtime.Windows specifics
detached: true, windowsHide: true, stdio: ['ignore', outputFileFd, outputFileFd]net.createServeron\\\\.\\pipe\\klonode-worker-<id>.klonode/workers/<id>.pidfor cleanup on next bootFiles to touch
New:
packages/ui/src/lib/workers/worker-client.ts— browser-side client that spawns, streams, reconnectspackages/ui/src/lib/workers/worker-manager.ts— server-side worker registry / spawnerpackages/ui/src/lib/workers/worker-protocol.ts— shared types for the IPC eventspackages/ui/src/routes/api/worker/spawn/+server.ts— POST endpointpackages/ui/src/routes/api/worker/[id]/stream/+server.ts— GET SSE with?since=offsetpackages/ui/src/routes/api/worker/[id]/stop/+server.ts— POST abortpackages/worker/— NEW package. A standalone Node binaryklonode-workerthat wraps the Claude CLI spawn, writes the log, handles the IPC, and persists state. Built with tsup like the existing CLI package.docs/self-hosting.md— update to describe the detached-worker flowModified:
packages/ui/src/routes/api/chat/stream/+server.ts— either delegate to the worker manager or keep as a legacy path for non-self-hosting setups with a warningpackages/ui/src/lib/components/ChatPanel/ChatPanel.svelte— replace direct fetch to/api/chat/streamwithworkerClient.spawn(...)+ reconnect logic inonMountpackages/ui/src/lib/stores/agents.ts—cliSessionIdsbecomesworkerIds: Record<tabId, workerId>; persisted across reloads as already done for the old session IDsAcceptance criteria
packages/ui/src/lib/components/ChatPanel/ChatPanel.sveltemanually in your editor while the stream is livepackages/ui/src/routes/api/chat/stream/+server.tsmanually while the stream is liveps/tasklistshows anode klonode-workerprocess even when Vite is downOut of scope for the first PR
Why this is help-wanted
The IPC transport, log-tailing, and reconnect protocol are well-understood problems with good reference implementations (pm2, forever, tmux, GotTY). The Klonode-specific part is narrow: wrap the existing Claude CLI spawn in a persistent wrapper and make the ChatPanel reconnect by worker ID instead of spawning fresh. A contributor who has shipped a Node process supervisor before could land the minimum viable version in a weekend.
Sibling issues:
Sibling PRs: