Skip to content

Async session startup and local capture queue#47

Open
efenocchi wants to merge 14 commits intomainfrom
feat/async-hooks
Open

Async session startup and local capture queue#47
efenocchi wants to merge 14 commits intomainfrom
feat/async-hooks

Conversation

@efenocchi
Copy link
Copy Markdown
Collaborator

Summary

  • Split session-start into fast sync + async setup: SessionStart hook now only reads local credentials and injects the system prompt (~10ms). Table creation, placeholder insertion, and version checking move to a new session-start-setup hook that runs asynchronously with a 120s timeout. This eliminates the 2-5s startup delay that blocked every new session.

  • Local JSONL capture queue: Capture hooks (UserPromptSubmit, PostToolUse, Stop, SubagentStop) no longer make direct API calls. Instead, events are appended to a local JSONL file at ~/.deeplake/capture/<sessionId>.jsonl. The wiki worker reads from these files at session end and flushes everything to the cloud in one batch. This removes all network latency from the hot path during a session.

  • Wiki worker reads from local queue: The wiki worker now processes local JSONL queue files instead of querying the sessions table. Session-end triggers the flush and summary generation.

Changes

Session startup split

  • src/hooks/session-start.ts — fast path only: read credentials, inject context
  • src/hooks/session-start-setup.ts — async: ensureTable, placeholder row, version check
  • Same split for codex (src/hooks/codex/)
  • hooks.json updated to register setup as async hook with 120s timeout

Capture queue

  • src/utils/capture-queue.ts — new utility: appendEvent() writes to local JSONL
  • src/hooks/capture.ts — uses appendEvent instead of direct API calls
  • src/hooks/codex/capture.ts, src/hooks/codex/stop.ts — same refactor

Wiki worker

  • src/hooks/wiki-worker.ts — reads from local JSONL queue files
  • src/hooks/session-end.ts — triggers flush at session end

Test plan

  • Fresh claude -p session starts in <1s (no table creation delay)
  • Session events are captured to ~/.deeplake/capture/ during the session
  • Wiki summary is generated at session end from the local queue
  • npm test passes
  • Codex hooks work with the same flow

Move table creation, placeholder insertion, and version check into
a separate session-start-setup.ts that runs as an async hook.
The fast path (session-start.ts) now only reads local credentials
and injects the system prompt context, keeping session startup <1s.
Same split as claude-code: move table creation, placeholder, and
version check into codex/session-start-setup.ts. The fast path
only reads credentials and injects context.
Add session-start-setup.js as a second SessionStart hook with
async: true and 120s timeout. This runs the slow setup (table
creation, version check) in parallel with the session.
New capture-queue.ts utility that appends session events to a local
JSONL file (~/.deeplake/capture/<sessionId>.jsonl) instead of making
direct API calls. Events are flushed to cloud at session end.
Replace direct API calls with appendEvent() from capture-queue.
No network calls during capture — events buffered locally.
Same local queue refactor for codex hooks. capture.ts and stop.ts
now use appendEvent() instead of direct API calls.
Wiki worker now reads session events from the local capture queue
files instead of querying the API. Session-end triggers the flush.
@claude
Copy link
Copy Markdown

claude bot commented Apr 14, 2026

Claude finished @efenocchi's task in 3m 45s —— View job


3 issues found:

  • Data loss risk (src/utils/capture-queue.ts:38): readEvents() throws on any malformed JSONL line (e.g. truncated by crash), aborting the entire flush and leaving the queue file permanently stranded.
  • Codex startup not actually async (codex/hooks/hooks.json:11): session-start-setup.js is bundled but never registered as an async hook for Codex — the table-creation delay is still on the hot path for Codex users.
  • Inconsistent SQL escaping (src/hooks/wiki-worker.ts:116): jsonForSql only escapes ' but all other string columns use esc() which also strips control chars and escapes backslashes; user-controlled prompt content can contain backslashes.

if (!existsSync(path)) return [];
const content = readFileSync(path, "utf-8").trim();
if (!content) return [];
return content.split("\n").map(line => JSON.parse(line));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JSON.parse(line) will throw if any line is truncated (e.g. process killed mid-write), causing the entire readEvents() call — and therefore flushQueue() — to throw before deleteQueue() runs, permanently stranding the queue file.

Filter out malformed lines instead of failing hard:

Suggested change
return content.split("\n").map(line => JSON.parse(line));
return content.split("\n").filter(Boolean).flatMap(line => {
try { return [JSON.parse(line) as Record<string, unknown>]; } catch { return []; }
});

"command": "node \"$CODEX_PLUGIN_ROOT/bundle/session-start.js\"",
"timeout": 120
"timeout": 10
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

session-start-setup.js is never registered here even though codex/bundle/session-start-setup.js is added by this PR and the description says "Same split for codex." Without this entry, Codex users still run table creation/placeholder/version-check synchronously in session-start.js and get no startup performance improvement.

Add the async setup hook after the sync entry:

Suggested change
}
"timeout": 10
},
{
"type": "command",
"command": "node \"$CODEX_PLUGIN_ROOT/bundle/session-start-setup.js\"",
"timeout": 120,
"async": true

const line = JSON.stringify(event);
// For JSONB: only escape single quotes for the SQL literal
const jsonForSql = line.replace(/'/g, "''");
const ts = (event.timestamp as string) ?? new Date().toISOString();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jsonForSql only escapes single-quotes, but user-controlled content (e.g. prompts with backslashes) passes through unmodified — unlike every other field which goes through esc() that also strips control chars and escapes backslashes. If the database has standard_conforming_strings = off, a backslash in a user prompt followed by a quote can break out of the SQL string literal.

Use the same esc() helper used for all other string columns:

Suggested change
const ts = (event.timestamp as string) ?? new Date().toISOString();
const jsonForSql = esc(line);

The local JSONL queue deferred server writes to session end, meaning
data only reached the server after the session finished. This defeated
the purpose of real-time capture — if a session crashed, events were
lost.

Restore the original direct INSERT approach for all capture hooks.
With async: true in CC hooks.json, the INSERT still happens in the
background without blocking the user. Codex hooks remain sync but
the ~200ms INSERT is acceptable.

The session-start split (sync context + async setup) is kept.
@efenocchi
Copy link
Copy Markdown
Collaborator Author

/claude-review

@claude
Copy link
Copy Markdown

claude bot commented Apr 14, 2026

Claude finished @efenocchi's task in 2m 40s —— View job


Review

  • Read changed source files
  • Analyze diff for bugs, security issues, correctness
  • Post inline review comments

2 issues found:

  • Codex setup hook never runs (codex/hooks/hooks.json:10): session-start-setup.js was bundled and the source was split, but the async hook registration was never added to codex/hooks/hooks.json. No placeholder row is created and no version check runs for any Codex session.
  • Shell injection in Codex auto-update (src/hooks/codex/session-start-setup.ts:170): ${tag} from the external GitHub version API is interpolated directly into a shell command. A tampered response could execute arbitrary commands.

"type": "command",
"command": "node \"$CODEX_PLUGIN_ROOT/bundle/session-start.js\"",
"timeout": 120
"timeout": 10
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

session-start-setup.js was bundled by this PR (added to codex/bundle/) and the source was refactored so all slow operations (table creation, placeholder, version check) moved there — but it is never registered here. Without this entry, Codex users run a fast session-start.js that skips all setup work, meaning no placeholder row is ever created and no version check runs.

Add the async setup hook right after this entry:

Suggested change
"timeout": 10
"timeout": 10
},
{
"type": "command",
"command": "node \"$CODEX_PLUGIN_ROOT/bundle/session-start-setup.js\"",
"timeout": 120,
"async": true

`elif [ -d ~/.codex/hivemind ]; then INSTALL_DIR=~/.codex/hivemind; fi; ` +
`if [ -n "$INSTALL_DIR" ]; then ` +
`TMPDIR=$(mktemp -d); ` +
`git clone --depth 1 --branch ${tag} -q https://github.com/activeloopai/hivemind.git "$TMPDIR/hivemind" 2>/dev/null && ` +
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

${tag} (derived from the external getLatestVersion() response) is interpolated directly into the shell command string passed to execSync. A tampered API response with shell metacharacters in the version field (e.g. 1.2.3; malicious_cmd) would execute arbitrary commands.

Validate the tag is a safe semver string before using it in the shell:

Suggested change
`git clone --depth 1 --branch ${tag} -q https://github.com/activeloopai/hivemind.git "$TMPDIR/hivemind" 2>/dev/null && ` +
const tagPattern = /^v\d+\.\d+\.\d+$/;
if (!tagPattern.test(tag)) {
throw new Error(`unsafe version tag: ${tag}`);
}
const findCmd = `INSTALL_DIR=""; ` +
`CACHE_DIR=$(find ~/.codex/plugins/cache -maxdepth 3 -name "hivemind" -type d 2>/dev/null | head -1); ` +
`if [ -n "$CACHE_DIR" ]; then INSTALL_DIR=$(ls -1d "$CACHE_DIR"/*/ 2>/dev/null | tail -1); ` +
`elif [ -d ~/.codex/hivemind ]; then INSTALL_DIR=~/.codex/hivemind; fi; ` +
`if [ -n "$INSTALL_DIR" ]; then ` +
`TMPDIR=$(mktemp -d); ` +
`git clone --depth 1 --branch ${tag} -q https://github.com/activeloopai/hivemind.git "$TMPDIR/hivemind" 2>/dev/null && ` +
`cp -r "$TMPDIR/hivemind/codex/"* "$INSTALL_DIR/" 2>/dev/null; ` +
`rm -rf "$TMPDIR"; fi`;

The generateVirtualIndex() regex only matched hivemind-style session paths
(/sessions/<user>/<user>_<org>_<ws>_<id>.jsonl). Generic paths like
/sessions/conv_0_session_1.json were invisible in the index.

Now supports both formats and strips _summary suffix when matching
summaries to their source session files.
session-end and codex/stop now skip wiki worker when capture is disabled.
session-start-setup always runs table sync (ensureTable/ensureSessionsTable)
but skips placeholder creation when DEEPLAKE_CAPTURE=false. Previously the
entire setup block was skipped, which broke table sync and caused stale
query results in the fast path.
Grep fast path queries memory (summaries) and sessions (raw dialogue) in
parallel. Read fast path falls back to the sessions table for /sessions/*
paths. Previously only the memory table was searched, so exact dates and
quotes only present in raw session data were invisible to the fast path.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant