MCP server dies on heavy ctx_execute workloads

## Summary

The context-mode MCP server (v1.0.75) crashes with `MCP error -32000: Connection closed` on heavy `ctx_execute` / `ctx_batch_execute` workloads. The crash cascades: a second call after the first failure also dies immediately, wiping tool registration from Claude Code until the user restarts.

Could be related to #242 and #236 

## Environment

- context-mode **1.0.75**
- Claude Code on Linux (Arch, kernel 6.19)
- Node **v25.2.1**, better-sqlite3 **12.8.0** (abi141 prebuilt present)
- Workload: processing ~173 JSONL files (~46 MB total) from `~/.claude/projects/`

## Reproduction

Twice in the same session, while scanning conversation logs:

1. Call 1: `ctx_execute(language: "javascript", code: <~60 LOC JS that reads 173 files, regex-scans ~8 MB of text, pushes substrings into an array, console.logs a small summary>, intent: "...")`. Child finished; parent MCP died mid-response. `ctx_stats` afterwards returned `"session (< 1 min)"` then the server had silently restarted.

2. Call 2: immediately after, a trivial `ctx_batch_execute` with six short shell commands (`ls`, `find -name ... | head`, `cat package.json`). Server died again and this time Claude Code **stopped re-registering the tools entirely**, all `mcp__plugin_context-mode_*` tools disappeared for the rest of the session.

This has happened before in my machine:

- `~/.claude/context-mode/sessions/` held **7 `.db` files but only 6 `.cleanup` markers** so, orphaned session state from past crashes.
- `~/.claude/context-mode/content/<session>.db` was **4 KB / effectively empty** the persistent index never populated despite many hours of use across sessions.
- Historical `sessionstart-debug.log`: `Cannot find module 'better-sqlite3'` thrown from `plugins/marketplaces/context-mode/hooks/session-db.bundle.mjs`. The module is now present in that path, so this specific historical error is resolved, but the empty index DB suggests the write path still fails silently somewhere.

## Suspected root cause

`src/server.ts:45-50` installs `uncaughtException` / `unhandledRejection` handlers that only `process.stderr.write(...)`, they don't re-throw or `process.exit`, so intent is "log and continue", but there's other things happening:

- `indexStdout` / `intentSearch` auto-fire on any stdout > 5 KB (`INTENT_SEARCH_THRESHOLD`) or > 100 KB (`LARGE_OUTPUT_THRESHOLD`) — see `src/server.ts:739, 749, 769, 779, 810, 839`.
- Both call into `store.index(...)` / `store.indexPlainText(...)`, which write to SQLite via better-sqlite3. better-sqlite3 throws synchronous exceptions from native C++ (SQLITE_BUSY, SQLITE_MISUSE, FTS5 tokenizer errors on unusual UTF-8, etc.).
- Synchronous throws from inside async tool handlers are caught by the outer `try { ... } catch` at `server.ts:788`, which returns `{ isError: true }`. **But** if the throw happens during the **response-building** step (e.g. inside `intentSearch`'s call chain), it propagates *past* the handler's catch and lands on the unhandled-exception path — which only logs.
- In that logged-but-not-recovered state, the stdio transport has already half-written a response. The MCP JSON-RPC client sees framing corruption and closes the stdin pipe. The lifecycle guard in `src/lifecycle.ts:59-63` treats stdin close as "parent gone" and calls `onShutdown()`. Server exits.
- On next launch, the orphaned WAL file (`*.db-wal`) on the session DB causes the new process to die during DB open → cascade failure. This matches the 1→2 crash pattern I observed exactly.

## Fixes?

1. Wrap SQLite writes in `try/catch` at `server.ts:810` and `server.ts:839` (every `store.index*` call). On failure, log to stderr and return a plain-text fallback — do NOT let it escape into the handler's outer catch.
2. Close WAL on shutdown: in `lifecycle.ts` `onShutdown`, run `db.pragma('wal_checkpoint(TRUNCATE)')` and `db.close()` before exit. This prevents orphaned WALs from moving to the next launch.
3. Validate DB on open: on startup, if opening a session DB throws, rename it to `.corrupt-<timestamp>` and create fresh.
4. Stronger `uncaughtException` handling: if an exception originates from better-sqlite3 (check `err.code?.startsWith('SQLITE_')`), attempt a DB close + WAL checkpoint before allowing the process to continue or exit cleanly.

## Impact

Anyone using context-mode to process large workloads will hit this eventually. Because the crash wipes tool registration without a clean error in the UI, users think "context-mode is flaky" and fall back to raw Bash, which kinda defeats the plugin's purpose. Sometimes claude itself falls back and refuses to use the mcp again

Happy to run further diagnostics.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MCP server dies on heavy ctx_execute workloads #244

Summary

Environment

Reproduction

Suspected root cause

Fixes?

Impact

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

MCP server dies on heavy ctx_execute workloads #244

Description

Summary

Environment

Reproduction

Suspected root cause

Fixes?

Impact

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions