Skip to content

MCP server dies on heavy ctx_execute workloads #244

@khaosdoctor

Description

@khaosdoctor

Summary

The context-mode MCP server (v1.0.75) crashes with MCP error -32000: Connection closed on heavy ctx_execute / ctx_batch_execute workloads. The crash cascades: a second call after the first failure also dies immediately, wiping tool registration from Claude Code until the user restarts.

Could be related to #242 and #236

Environment

  • context-mode 1.0.75
  • Claude Code on Linux (Arch, kernel 6.19)
  • Node v25.2.1, better-sqlite3 12.8.0 (abi141 prebuilt present)
  • Workload: processing ~173 JSONL files (~46 MB total) from ~/.claude/projects/

Reproduction

Twice in the same session, while scanning conversation logs:

  1. Call 1: ctx_execute(language: "javascript", code: <~60 LOC JS that reads 173 files, regex-scans ~8 MB of text, pushes substrings into an array, console.logs a small summary>, intent: "..."). Child finished; parent MCP died mid-response. ctx_stats afterwards returned "session (< 1 min)" then the server had silently restarted.

  2. Call 2: immediately after, a trivial ctx_batch_execute with six short shell commands (ls, find -name ... | head, cat package.json). Server died again and this time Claude Code stopped re-registering the tools entirely, all mcp__plugin_context-mode_* tools disappeared for the rest of the session.

This has happened before in my machine:

  • ~/.claude/context-mode/sessions/ held 7 .db files but only 6 .cleanup markers so, orphaned session state from past crashes.
  • ~/.claude/context-mode/content/<session>.db was 4 KB / effectively empty the persistent index never populated despite many hours of use across sessions.
  • Historical sessionstart-debug.log: Cannot find module 'better-sqlite3' thrown from plugins/marketplaces/context-mode/hooks/session-db.bundle.mjs. The module is now present in that path, so this specific historical error is resolved, but the empty index DB suggests the write path still fails silently somewhere.

Suspected root cause

src/server.ts:45-50 installs uncaughtException / unhandledRejection handlers that only process.stderr.write(...), they don't re-throw or process.exit, so intent is "log and continue", but there's other things happening:

  • indexStdout / intentSearch auto-fire on any stdout > 5 KB (INTENT_SEARCH_THRESHOLD) or > 100 KB (LARGE_OUTPUT_THRESHOLD) — see src/server.ts:739, 749, 769, 779, 810, 839.
  • Both call into store.index(...) / store.indexPlainText(...), which write to SQLite via better-sqlite3. better-sqlite3 throws synchronous exceptions from native C++ (SQLITE_BUSY, SQLITE_MISUSE, FTS5 tokenizer errors on unusual UTF-8, etc.).
  • Synchronous throws from inside async tool handlers are caught by the outer try { ... } catch at server.ts:788, which returns { isError: true }. But if the throw happens during the response-building step (e.g. inside intentSearch's call chain), it propagates past the handler's catch and lands on the unhandled-exception path — which only logs.
  • In that logged-but-not-recovered state, the stdio transport has already half-written a response. The MCP JSON-RPC client sees framing corruption and closes the stdin pipe. The lifecycle guard in src/lifecycle.ts:59-63 treats stdin close as "parent gone" and calls onShutdown(). Server exits.
  • On next launch, the orphaned WAL file (*.db-wal) on the session DB causes the new process to die during DB open → cascade failure. This matches the 1→2 crash pattern I observed exactly.

Fixes?

  1. Wrap SQLite writes in try/catch at server.ts:810 and server.ts:839 (every store.index* call). On failure, log to stderr and return a plain-text fallback — do NOT let it escape into the handler's outer catch.
  2. Close WAL on shutdown: in lifecycle.ts onShutdown, run db.pragma('wal_checkpoint(TRUNCATE)') and db.close() before exit. This prevents orphaned WALs from moving to the next launch.
  3. Validate DB on open: on startup, if opening a session DB throws, rename it to .corrupt-<timestamp> and create fresh.
  4. Stronger uncaughtException handling: if an exception originates from better-sqlite3 (check err.code?.startsWith('SQLITE_')), attempt a DB close + WAL checkpoint before allowing the process to continue or exit cleanly.

Impact

Anyone using context-mode to process large workloads will hit this eventually. Because the crash wipes tool registration without a clean error in the UI, users think "context-mode is flaky" and fall back to raw Bash, which kinda defeats the plugin's purpose. Sometimes claude itself falls back and refuses to use the mcp again

Happy to run further diagnostics.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions