Async session startup and local capture queue#47
Conversation
Move table creation, placeholder insertion, and version check into a separate session-start-setup.ts that runs as an async hook. The fast path (session-start.ts) now only reads local credentials and injects the system prompt context, keeping session startup <1s.
Same split as claude-code: move table creation, placeholder, and version check into codex/session-start-setup.ts. The fast path only reads credentials and injects context.
Add session-start-setup.js as a second SessionStart hook with async: true and 120s timeout. This runs the slow setup (table creation, version check) in parallel with the session.
New capture-queue.ts utility that appends session events to a local JSONL file (~/.deeplake/capture/<sessionId>.jsonl) instead of making direct API calls. Events are flushed to cloud at session end.
Replace direct API calls with appendEvent() from capture-queue. No network calls during capture — events buffered locally.
Same local queue refactor for codex hooks. capture.ts and stop.ts now use appendEvent() instead of direct API calls.
Wiki worker now reads session events from the local capture queue files instead of querying the API. Session-end triggers the flush.
|
Claude finished @efenocchi's task in 3m 45s —— View job 3 issues found:
|
src/utils/capture-queue.ts
Outdated
| if (!existsSync(path)) return []; | ||
| const content = readFileSync(path, "utf-8").trim(); | ||
| if (!content) return []; | ||
| return content.split("\n").map(line => JSON.parse(line)); |
There was a problem hiding this comment.
JSON.parse(line) will throw if any line is truncated (e.g. process killed mid-write), causing the entire readEvents() call — and therefore flushQueue() — to throw before deleteQueue() runs, permanently stranding the queue file.
Filter out malformed lines instead of failing hard:
| return content.split("\n").map(line => JSON.parse(line)); | |
| return content.split("\n").filter(Boolean).flatMap(line => { | |
| try { return [JSON.parse(line) as Record<string, unknown>]; } catch { return []; } | |
| }); |
| "command": "node \"$CODEX_PLUGIN_ROOT/bundle/session-start.js\"", | ||
| "timeout": 120 | ||
| "timeout": 10 | ||
| } |
There was a problem hiding this comment.
session-start-setup.js is never registered here even though codex/bundle/session-start-setup.js is added by this PR and the description says "Same split for codex." Without this entry, Codex users still run table creation/placeholder/version-check synchronously in session-start.js and get no startup performance improvement.
Add the async setup hook after the sync entry:
| } | |
| "timeout": 10 | |
| }, | |
| { | |
| "type": "command", | |
| "command": "node \"$CODEX_PLUGIN_ROOT/bundle/session-start-setup.js\"", | |
| "timeout": 120, | |
| "async": true |
src/hooks/wiki-worker.ts
Outdated
| const line = JSON.stringify(event); | ||
| // For JSONB: only escape single quotes for the SQL literal | ||
| const jsonForSql = line.replace(/'/g, "''"); | ||
| const ts = (event.timestamp as string) ?? new Date().toISOString(); |
There was a problem hiding this comment.
jsonForSql only escapes single-quotes, but user-controlled content (e.g. prompts with backslashes) passes through unmodified — unlike every other field which goes through esc() that also strips control chars and escapes backslashes. If the database has standard_conforming_strings = off, a backslash in a user prompt followed by a quote can break out of the SQL string literal.
Use the same esc() helper used for all other string columns:
| const ts = (event.timestamp as string) ?? new Date().toISOString(); | |
| const jsonForSql = esc(line); |
The local JSONL queue deferred server writes to session end, meaning data only reached the server after the session finished. This defeated the purpose of real-time capture — if a session crashed, events were lost. Restore the original direct INSERT approach for all capture hooks. With async: true in CC hooks.json, the INSERT still happens in the background without blocking the user. Codex hooks remain sync but the ~200ms INSERT is acceptable. The session-start split (sync context + async setup) is kept.
|
/claude-review |
|
Claude finished @efenocchi's task in 2m 40s —— View job Review
2 issues found:
|
| "type": "command", | ||
| "command": "node \"$CODEX_PLUGIN_ROOT/bundle/session-start.js\"", | ||
| "timeout": 120 | ||
| "timeout": 10 |
There was a problem hiding this comment.
session-start-setup.js was bundled by this PR (added to codex/bundle/) and the source was refactored so all slow operations (table creation, placeholder, version check) moved there — but it is never registered here. Without this entry, Codex users run a fast session-start.js that skips all setup work, meaning no placeholder row is ever created and no version check runs.
Add the async setup hook right after this entry:
| "timeout": 10 | |
| "timeout": 10 | |
| }, | |
| { | |
| "type": "command", | |
| "command": "node \"$CODEX_PLUGIN_ROOT/bundle/session-start-setup.js\"", | |
| "timeout": 120, | |
| "async": true |
| `elif [ -d ~/.codex/hivemind ]; then INSTALL_DIR=~/.codex/hivemind; fi; ` + | ||
| `if [ -n "$INSTALL_DIR" ]; then ` + | ||
| `TMPDIR=$(mktemp -d); ` + | ||
| `git clone --depth 1 --branch ${tag} -q https://github.com/activeloopai/hivemind.git "$TMPDIR/hivemind" 2>/dev/null && ` + |
There was a problem hiding this comment.
${tag} (derived from the external getLatestVersion() response) is interpolated directly into the shell command string passed to execSync. A tampered API response with shell metacharacters in the version field (e.g. 1.2.3; malicious_cmd) would execute arbitrary commands.
Validate the tag is a safe semver string before using it in the shell:
| `git clone --depth 1 --branch ${tag} -q https://github.com/activeloopai/hivemind.git "$TMPDIR/hivemind" 2>/dev/null && ` + | |
| const tagPattern = /^v\d+\.\d+\.\d+$/; | |
| if (!tagPattern.test(tag)) { | |
| throw new Error(`unsafe version tag: ${tag}`); | |
| } | |
| const findCmd = `INSTALL_DIR=""; ` + | |
| `CACHE_DIR=$(find ~/.codex/plugins/cache -maxdepth 3 -name "hivemind" -type d 2>/dev/null | head -1); ` + | |
| `if [ -n "$CACHE_DIR" ]; then INSTALL_DIR=$(ls -1d "$CACHE_DIR"/*/ 2>/dev/null | tail -1); ` + | |
| `elif [ -d ~/.codex/hivemind ]; then INSTALL_DIR=~/.codex/hivemind; fi; ` + | |
| `if [ -n "$INSTALL_DIR" ]; then ` + | |
| `TMPDIR=$(mktemp -d); ` + | |
| `git clone --depth 1 --branch ${tag} -q https://github.com/activeloopai/hivemind.git "$TMPDIR/hivemind" 2>/dev/null && ` + | |
| `cp -r "$TMPDIR/hivemind/codex/"* "$INSTALL_DIR/" 2>/dev/null; ` + | |
| `rm -rf "$TMPDIR"; fi`; |
The generateVirtualIndex() regex only matched hivemind-style session paths (/sessions/<user>/<user>_<org>_<ws>_<id>.jsonl). Generic paths like /sessions/conv_0_session_1.json were invisible in the index. Now supports both formats and strips _summary suffix when matching summaries to their source session files.
session-end and codex/stop now skip wiki worker when capture is disabled. session-start-setup always runs table sync (ensureTable/ensureSessionsTable) but skips placeholder creation when DEEPLAKE_CAPTURE=false. Previously the entire setup block was skipped, which broke table sync and caused stale query results in the fast path.
Grep fast path queries memory (summaries) and sessions (raw dialogue) in parallel. Read fast path falls back to the sessions table for /sessions/* paths. Previously only the memory table was searched, so exact dates and quotes only present in raw session data were invisible to the fast path.
Summary
Split session-start into fast sync + async setup: SessionStart hook now only reads local credentials and injects the system prompt (~10ms). Table creation, placeholder insertion, and version checking move to a new
session-start-setuphook that runs asynchronously with a 120s timeout. This eliminates the 2-5s startup delay that blocked every new session.Local JSONL capture queue: Capture hooks (UserPromptSubmit, PostToolUse, Stop, SubagentStop) no longer make direct API calls. Instead, events are appended to a local JSONL file at
~/.deeplake/capture/<sessionId>.jsonl. The wiki worker reads from these files at session end and flushes everything to the cloud in one batch. This removes all network latency from the hot path during a session.Wiki worker reads from local queue: The wiki worker now processes local JSONL queue files instead of querying the sessions table. Session-end triggers the flush and summary generation.
Changes
Session startup split
src/hooks/session-start.ts— fast path only: read credentials, inject contextsrc/hooks/session-start-setup.ts— async: ensureTable, placeholder row, version checksrc/hooks/codex/)hooks.jsonupdated to register setup as async hook with 120s timeoutCapture queue
src/utils/capture-queue.ts— new utility:appendEvent()writes to local JSONLsrc/hooks/capture.ts— uses appendEvent instead of direct API callssrc/hooks/codex/capture.ts,src/hooks/codex/stop.ts— same refactorWiki worker
src/hooks/wiki-worker.ts— reads from local JSONL queue filessrc/hooks/session-end.ts— triggers flush at session endTest plan
claude -psession starts in <1s (no table creation delay)~/.deeplake/capture/during the sessionnpm testpasses