diff --git a/docs/plans/2026-05-05-persistent-store-restart-safe-gates.md b/docs/plans/2026-05-05-persistent-store-restart-safe-gates.md new file mode 100644 index 00000000..7f6587d5 --- /dev/null +++ b/docs/plans/2026-05-05-persistent-store-restart-safe-gates.md @@ -0,0 +1,2709 @@ +# Persistent SessionStore + Restart-Safe Decision Gates Implementation Plan + +> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. + +**Goal:** Replace `@valet/engine`'s in-memory-only suspension model with a persisted store and re-entrant decision gates so a session can survive a process restart, and prove it with a full restart-cycle integration test. + +**Architecture:** Add a SQLite-backed `SessionStore` (Drizzle schema, dialect-portable to D1 and Postgres). Make decision-gate IDs deterministic so a tool replay produces the same gate ID; on replay, `ctx.requestDecision(...)` short-circuits and returns the stored resolution instead of opening a new gate. `Engine.restoreSession({ sessionId, options })` rehydrates session/threads/entries/queue (preserving assistant tool-call blocks via `MessageEntry.parts`), and for any thread blocked on a gate, replays the suspended tool call (with `ctx.suspendedDecision` populated) once the gate resolves. + +**Tech Stack:** TypeScript, Drizzle ORM (`drizzle-orm/sqlite-core`), `better-sqlite3` (in-process SQLite for tests + dev), `drizzle-kit` (migrations), vitest. Postgres dialect mirror is deferred — the plan calls out where it slots in but doesn't ship it. + +--- + +## Background: What needs to change + +Today (`packages/engine/src/thread.ts:requestDecision`): + +```ts +const resolution = await this.gates.register(gate, onExpire); +``` + +`GateManager.register` returns a Promise that resolves only when `Session.resolveDecision` is called in this process. Restart kills the Promise, the suspension is lost. + +The spec (line 722) requires: + +> The engine does not rely on preserving an in-memory JavaScript continuation across restarts. Tools that call `requestDecision(...)` must therefore be re-entrant up to their decision points. On first execution, `requestDecision(...)` persists the gate and suspends the turn. On resumed execution, the engine re-runs the tool from the start with `suspendedDecision` populated for the matching gate ID, and the same `requestDecision(...)` call returns the stored resolution instead of creating a new gate. + +So the change: + +1. Gate IDs are deterministic. Same `(sessionId, threadId, queueItemId, resumeKey)` → same gate ID. Tools must supply `resumeKey`. +2. `requestDecision`: if `ctx.suspendedDecision` is set with a matching `gateId`, return the stored resolution synchronously. Otherwise, open or look up the gate, persist `SuspendedTurnState`, suspend. +3. The store actually persists everything (session, threads, entries, queue, gates, refs, suspended turns). +4. `Engine.restoreSession(id)` reads back state, re-builds the agent transcript, and for each blocked thread either (a) waits for the still-pending gate to resolve, or (b) replays the suspended tool with the resolved gate's resolution. +5. After replay, the agent continues normally. + +## File Structure + +| File | Status | Purpose | +| --- | --- | --- | +| `packages/engine/package.json` | modify | Add `drizzle-orm`, `drizzle-kit`, `better-sqlite3`, `@types/better-sqlite3` | +| `packages/engine/drizzle.config.ts` | create | drizzle-kit config pointing at the schema | +| `packages/engine/src/schema/sqlite.ts` | create | SQLite Drizzle schema for engine tables | +| `packages/engine/src/schema/index.ts` | create | Re-exports | +| `packages/engine/migrations/sqlite/0001_initial.sql` | create | Generated initial migration | +| `packages/engine/src/providers/sqlite-store.ts` | create | `SqliteSessionStore` | +| `packages/engine/src/providers/sqlite-store-helpers.ts` | create | Row encoding/decoding helpers | +| `packages/engine/src/decision-gate.ts` | modify | Stable gate ID derivation; `GateManager` accepts pre-registered resolutions | +| `packages/engine/src/thread.ts` | modify | `requestDecision` short-circuits on `ctx.suspendedDecision`; persist toolCallId/toolArgs | +| `packages/engine/src/session.ts` | modify | Tool replay path on restoration | +| `packages/engine/src/engine.ts` | modify | Real `restoreSession()` | +| `packages/engine/src/index.ts` | modify | Re-export `SqliteSessionStore`, `createSqliteStore` factory | +| `packages/engine/test/store-contract.ts` | create | Shared `SessionStore` contract test suite | +| `packages/engine/test/in-memory-store.test.ts` | create | Run contract suite against `InMemorySessionStore` | +| `packages/engine/test/sqlite-store.test.ts` | create | Run contract suite against `SqliteSessionStore` | +| `packages/engine/test/restart-safe-gates.test.ts` | create | End-to-end restart cycle | + +--- + +## Phase 1: Schema and migrations + +### Task 1: Add drizzle and better-sqlite3 deps + +**Files:** +- Modify: `packages/engine/package.json` + +- [ ] **Step 1: Update package.json** + +```json +{ + "name": "@valet/engine", + "version": "0.0.1", + "private": true, + "type": "module", + "main": "./dist/index.js", + "types": "./dist/index.d.ts", + "exports": { + ".": { + "types": "./dist/index.d.ts", + "import": "./dist/index.js" + } + }, + "scripts": { + "build": "tsc", + "typecheck": "tsc --noEmit", + "test": "vitest run", + "db:generate": "drizzle-kit generate" + }, + "dependencies": { + "@mariozechner/pi-agent-core": "0.73.0", + "@mariozechner/pi-ai": "0.73.0", + "drizzle-orm": "^0.45.1", + "typebox": "^1.1.24" + }, + "devDependencies": { + "@types/better-sqlite3": "^7.6.8", + "better-sqlite3": "^11.0.0", + "drizzle-kit": "^0.31.9", + "typescript": "^5.3.3", + "vitest": "^4.0.18" + } +} +``` + +- [ ] **Step 2: Install** + +```bash +cd /Users/conner/code/valet/.worktrees/portable-runtime-v1-spec && pnpm install +``` + +Expected: `Done in ` with no resolution errors. + +- [ ] **Step 3: Commit** + +```bash +git add packages/engine/package.json pnpm-lock.yaml +git commit -m "chore(engine): add drizzle-orm and better-sqlite3 deps" +``` + +--- + +### Task 2: Define SQLite schema for engine tables + +**Files:** +- Create: `packages/engine/src/schema/sqlite.ts` +- Create: `packages/engine/src/schema/index.ts` + +The schema mirrors the table list in the spec ("Required Tables" section, lines 1338-1349). Use SQLite types: `text` (default for everything textual or JSON-serialized), `integer` (for booleans-as-0/1 and unix-ms timestamps), and JSON-encoded `text` for nested objects. + +- [ ] **Step 1: Write the schema file** + +```ts +// packages/engine/src/schema/sqlite.ts +import { sql } from "drizzle-orm"; +import { sqliteTable, text, integer, index, primaryKey } from "drizzle-orm/sqlite-core"; + +export const engineSessions = sqliteTable("engine_sessions", { + id: text("id").primaryKey(), + userId: text("user_id").notNull(), + orgId: text("org_id").notNull(), + workspace: text("workspace").notNull(), + purpose: text("purpose").notNull(), + status: text("status").notNull(), + sandboxId: text("sandbox_id"), + snapshotId: text("snapshot_id"), + parentSessionId: text("parent_session_id"), + metadata: text("metadata"), // JSON + createdAt: integer("created_at").notNull(), + updatedAt: integer("updated_at").notNull(), +}, (t) => [ + index("engine_sessions_user").on(t.userId), + index("engine_sessions_status").on(t.status), +]); + +export const engineThreads = sqliteTable("engine_threads", { + id: text("id").primaryKey(), + sessionId: text("session_id").notNull(), + key: text("key").notNull(), + status: text("status").notNull(), + activeLeafEntryId: text("active_leaf_entry_id"), + queueMode: text("queue_mode").notNull(), + model: text("model"), + summary: text("summary"), + metadata: text("metadata"), // JSON + createdAt: integer("created_at").notNull(), + updatedAt: integer("updated_at").notNull(), +}, (t) => [ + index("engine_threads_session").on(t.sessionId), + index("engine_threads_session_key").on(t.sessionId, t.key), +]); + +export const engineEntries = sqliteTable("engine_entries", { + id: text("id").primaryKey(), + sessionId: text("session_id").notNull(), + threadId: text("thread_id").notNull(), + parentId: text("parent_id"), + entryType: text("entry_type").notNull(), // 'message' | 'compaction' | 'branch_summary' | 'decision_gate' + // for message entries + role: text("role"), + content: text("content"), + parts: text("parts"), // JSON + author: text("author"), // JSON + channel: text("channel"), // JSON + model: text("model"), + // for compaction entries + summary: text("summary"), + coveredEntryIds: text("covered_entry_ids"), // JSON array + tokenCountBefore: integer("token_count_before"), + tokenCountAfter: integer("token_count_after"), + fileContext: text("file_context"), // JSON + // for branch_summary entries + branchRootId: text("branch_root_id"), + branchLeafId: text("branch_leaf_id"), + // for decision_gate entries + gateId: text("gate_id"), + resolvedAt: text("resolved_at"), + resolution: text("resolution"), // JSON + withdrawnReason: text("withdrawn_reason"), + // common + metadata: text("metadata"), // JSON + createdAt: integer("created_at").notNull(), +}, (t) => [ + index("engine_entries_thread").on(t.sessionId, t.threadId, t.createdAt), + index("engine_entries_gate").on(t.gateId), +]); + +export const engineQueueItems = sqliteTable("engine_queue_items", { + id: text("id").primaryKey(), + sessionId: text("session_id").notNull(), + threadId: text("thread_id").notNull(), + status: text("status").notNull(), // 'queued' | 'running' | 'blocked_on_decision_gate' | 'paused' | 'idle' + mode: text("mode").notNull(), // queue mode at submission time + content: text("content").notNull(), // JSON PromptContent + author: text("author"), // JSON + channel: text("channel"), // JSON + replyTarget: text("reply_target"), // JSON + model: text("model"), + metadata: text("metadata"), // JSON + createdAt: integer("created_at").notNull(), + updatedAt: integer("updated_at").notNull(), +}, (t) => [ + index("engine_queue_items_thread").on(t.sessionId, t.threadId, t.status), +]); + +export const engineQueueState = sqliteTable("engine_queue_state", { + threadId: text("thread_id").notNull(), + sessionId: text("session_id").notNull(), + mode: text("mode").notNull(), + status: text("status").notNull(), + activeItemId: text("active_item_id"), + pending: text("pending").notNull(), // JSON QueueItem[] + collectBuffer: text("collect_buffer"), // JSON QueueItem[] | null + blockedGateId: text("blocked_gate_id"), + updatedAt: integer("updated_at").notNull(), +}, (t) => [ + primaryKey({ columns: [t.sessionId, t.threadId] }), +]); + +export const engineDecisionGates = sqliteTable("engine_decision_gates", { + id: text("id").primaryKey(), + sessionId: text("session_id").notNull(), + threadId: text("thread_id").notNull(), + type: text("type").notNull(), + status: text("status").notNull(), + title: text("title").notNull(), + body: text("body"), + actions: text("actions").notNull(), // JSON + origin: text("origin"), // JSON + context: text("context"), // JSON + resolution: text("resolution"), // JSON + expiresAt: integer("expires_at"), + createdAt: integer("created_at").notNull(), + updatedAt: integer("updated_at").notNull(), +}, (t) => [ + index("engine_decision_gates_thread").on(t.sessionId, t.threadId, t.status), +]); + +export const engineDecisionGateRefs = sqliteTable("engine_decision_gate_refs", { + id: text("id").primaryKey(), + gateId: text("gate_id").notNull(), + channelType: text("channel_type").notNull(), + ref: text("ref").notNull(), // JSON + createdAt: integer("created_at").notNull(), + updatedAt: integer("updated_at").notNull(), +}, (t) => [ + index("engine_decision_gate_refs_gate").on(t.gateId), +]); + +export const engineSuspendedTurns = sqliteTable("engine_suspended_turns", { + sessionId: text("session_id").notNull(), + threadId: text("thread_id").notNull(), + queueItemId: text("queue_item_id").notNull(), + gateId: text("gate_id").notNull(), + model: text("model").notNull(), + leafEntryId: text("leaf_entry_id"), + toolCallId: text("tool_call_id").notNull(), + toolName: text("tool_name").notNull(), + toolArgs: text("tool_args").notNull(), // JSON + resumeKey: text("resume_key").notNull(), + attempt: integer("attempt").notNull(), + createdAt: integer("created_at").notNull(), +}, (t) => [ + primaryKey({ columns: [t.sessionId, t.threadId] }), + index("engine_suspended_turns_gate").on(t.gateId), +]); +``` + +- [ ] **Step 2: Write the index file** + +```ts +// packages/engine/src/schema/index.ts +export * from "./sqlite.js"; +``` + +- [ ] **Step 3: Typecheck** + +```bash +cd packages/engine && pnpm typecheck +``` + +Expected: clean (no output). + +- [ ] **Step 4: Commit** + +```bash +git add packages/engine/src/schema +git commit -m "feat(engine): add Drizzle SQLite schema for engine tables" +``` + +--- + +### Task 3: Generate the initial migration + +**Files:** +- Create: `packages/engine/drizzle.config.ts` +- Create: `packages/engine/migrations/sqlite/0001_initial.sql` (via drizzle-kit) + +- [ ] **Step 1: Write drizzle.config.ts** + +```ts +// packages/engine/drizzle.config.ts +import { defineConfig } from "drizzle-kit"; + +export default defineConfig({ + dialect: "sqlite", + schema: "./src/schema/sqlite.ts", + out: "./migrations/sqlite", +}); +``` + +- [ ] **Step 2: Generate migration** + +```bash +cd packages/engine && pnpm db:generate +``` + +Expected output: `1 file generated` and a new `migrations/sqlite/0001_*.sql` file. + +- [ ] **Step 3: Verify the migration looks right** + +```bash +ls packages/engine/migrations/sqlite/ +head -40 packages/engine/migrations/sqlite/0001_*.sql +``` + +Expected: contains `CREATE TABLE engine_sessions`, `engine_threads`, `engine_entries`, `engine_queue_items`, `engine_queue_state`, `engine_decision_gates`, `engine_decision_gate_refs`, `engine_suspended_turns`. + +- [ ] **Step 4: Commit** + +```bash +git add packages/engine/drizzle.config.ts packages/engine/migrations/sqlite +git commit -m "feat(engine): generate initial sqlite migration" +``` + +--- + +## Phase 2: SqliteSessionStore + contract tests + +### Task 4: Extract `SessionStore` contract test suite + +The same tests should run against any `SessionStore` implementation. Extract them into a function that takes a store factory. + +**Files:** +- Create: `packages/engine/test/store-contract.ts` + +- [ ] **Step 1: Write the contract suite** + +```ts +// packages/engine/test/store-contract.ts +import { describe, it, expect, beforeEach } from "vitest"; +import type { + DecisionGate, + MessageEntry, + QueueState, + SessionData, + SessionEntry, + SessionStore, + SuspendedTurnState, + ThreadData, +} from "../src/index.js"; + +export interface StoreContractContext { + factory: () => SessionStore | Promise; + /** Optional async teardown; called after each test. */ + teardown?: (store: SessionStore) => void | Promise; +} + +export function runSessionStoreContract(name: string, ctx: StoreContractContext) { + describe(`SessionStore contract: ${name}`, () => { + let store: SessionStore; + + beforeEach(async () => { + store = await ctx.factory(); + }); + + function newSession(overrides: Partial = {}): SessionData { + return { + id: "sess-1", + userId: "u1", + orgId: "o1", + workspace: "/", + purpose: "interactive", + status: "running", + createdAt: 1, + updatedAt: 1, + ...overrides, + }; + } + + function newThread(sessionId: string, key = "web:default"): ThreadData { + return { + id: "th-1", + sessionId, + key, + status: "active", + queueMode: "followup", + createdAt: 1, + updatedAt: 1, + }; + } + + it("saveSession + getSession round-trips", async () => { + const s = newSession(); + await store.saveSession(s); + const loaded = await store.getSession(s.id); + expect(loaded).toMatchObject({ id: "sess-1", userId: "u1", status: "running" }); + }); + + it("listSessions filters by userId", async () => { + await store.saveSession(newSession({ id: "a", userId: "u1" })); + await store.saveSession(newSession({ id: "b", userId: "u2" })); + const list = await store.listSessions("u1"); + expect(list.map((s) => s.id)).toEqual(["a"]); + }); + + it("saveThread + listThreads round-trips", async () => { + await store.saveSession(newSession()); + await store.saveThread("sess-1", newThread("sess-1", "task:A")); + await store.saveThread("sess-1", newThread("sess-1", "task:B")); + // Use a unique id for each + await store.saveThread("sess-1", { ...newThread("sess-1", "task:B"), id: "th-2" }); + const threads = await store.listThreads("sess-1"); + expect(threads.length).toBeGreaterThanOrEqual(2); + }); + + it("appendEntries + getEntries returns entries in insertion order", async () => { + await store.saveSession(newSession()); + await store.saveThread("sess-1", newThread("sess-1")); + const entries: SessionEntry[] = [ + msg("e-1", "user", "hi", 10), + msg("e-2", "assistant", "hello", 20), + ]; + await store.appendEntries("sess-1", "th-1", entries); + const loaded = await store.getEntries("sess-1", "th-1"); + expect(loaded).toHaveLength(2); + expect(loaded[0]).toMatchObject({ id: "e-1", type: "message", role: "user", content: "hi" }); + expect(loaded[1]).toMatchObject({ id: "e-2", type: "message", role: "assistant" }); + }); + + it("appendEntries persists decision_gate entries", async () => { + await store.saveSession(newSession()); + await store.saveThread("sess-1", newThread("sess-1")); + const gate: DecisionGate = { + id: "g-1", + sessionId: "sess-1", + threadId: "th-1", + type: "approval", + status: "pending", + title: "ok?", + actions: [{ id: "approve", label: "Approve" }], + createdAt: 100, + updatedAt: 100, + }; + await store.saveDecisionGate("sess-1", "th-1", gate); + await store.appendEntries("sess-1", "th-1", [ + { + id: "e-g", + sessionId: "sess-1", + threadId: "th-1", + parentId: null, + type: "decision_gate", + gate, + createdAt: 100, + }, + ]); + const loaded = await store.getEntries("sess-1", "th-1"); + const gateEntry = loaded.find((e) => e.type === "decision_gate"); + expect(gateEntry).toBeDefined(); + expect(gateEntry && gateEntry.type === "decision_gate" && gateEntry.gate.id).toBe("g-1"); + }); + + it("saveQueueState + getQueueState round-trips", async () => { + await store.saveSession(newSession()); + await store.saveThread("sess-1", newThread("sess-1")); + const qs: QueueState = { + threadId: "th-1", + mode: "followup", + status: "running", + activeItemId: "q-1", + pending: [], + }; + await store.saveQueueState("sess-1", "th-1", qs); + const loaded = await store.getQueueState("sess-1", "th-1"); + expect(loaded).toMatchObject({ threadId: "th-1", status: "running", activeItemId: "q-1" }); + }); + + it("saveDecisionGate + listDecisionGates + getDecisionGate", async () => { + await store.saveSession(newSession()); + await store.saveThread("sess-1", newThread("sess-1")); + const gate: DecisionGate = { + id: "g-1", + sessionId: "sess-1", + threadId: "th-1", + type: "approval", + status: "pending", + title: "x", + actions: [], + createdAt: 1, + updatedAt: 1, + }; + await store.saveDecisionGate("sess-1", "th-1", gate); + const list = await store.listDecisionGates("sess-1"); + expect(list).toHaveLength(1); + const single = await store.getDecisionGate("sess-1", "g-1"); + expect(single?.title).toBe("x"); + }); + + it("saveSuspendedTurn + getSuspendedTurn + clearSuspendedTurn", async () => { + await store.saveSession(newSession()); + await store.saveThread("sess-1", newThread("sess-1")); + const sus: SuspendedTurnState = { + sessionId: "sess-1", + threadId: "th-1", + queueItemId: "q-1", + gateId: "g-1", + model: "faux/faux-1", + toolCallId: "tc-1", + toolName: "do_thing", + toolArgs: { arg: "x" }, + resumeKey: "do_thing:x", + attempt: 1, + createdAt: 1, + }; + await store.saveSuspendedTurn("sess-1", "th-1", sus); + expect(await store.getSuspendedTurn("sess-1", "th-1")).toMatchObject({ toolName: "do_thing" }); + await store.clearSuspendedTurn("sess-1", "th-1"); + expect(await store.getSuspendedTurn("sess-1", "th-1")).toBeNull(); + }); + + it("updateDecisionGateEntry patches the matching entry", async () => { + await store.saveSession(newSession()); + await store.saveThread("sess-1", newThread("sess-1")); + const gate: DecisionGate = { + id: "g-1", + sessionId: "sess-1", + threadId: "th-1", + type: "approval", + status: "pending", + title: "x", + actions: [], + createdAt: 1, + updatedAt: 1, + }; + await store.saveDecisionGate("sess-1", "th-1", gate); + await store.appendEntries("sess-1", "th-1", [ + { + id: "e-g", + sessionId: "sess-1", + threadId: "th-1", + parentId: null, + type: "decision_gate", + gate, + createdAt: 1, + }, + ]); + await store.updateDecisionGateEntry("sess-1", "th-1", "g-1", { + gate: { ...gate, status: "resolved" }, + resolution: { actionId: "approve", resolvedBy: "u1", resolvedAt: 5 }, + }); + const entries = await store.getEntries("sess-1", "th-1"); + const e = entries.find((x) => x.type === "decision_gate"); + expect(e && e.type === "decision_gate" && e.gate.status).toBe("resolved"); + expect(e && e.type === "decision_gate" && e.resolution?.actionId).toBe("approve"); + }); + + it("deleteSession removes the session", async () => { + await store.saveSession(newSession()); + await store.deleteSession("sess-1"); + expect(await store.getSession("sess-1")).toBeNull(); + }); + }); +} + +function msg(id: string, role: "user" | "assistant", content: string, ts: number): MessageEntry { + return { + id, + sessionId: "sess-1", + threadId: "th-1", + parentId: null, + type: "message", + role, + content, + createdAt: ts, + }; +} +``` + +- [ ] **Step 2: Typecheck** + +```bash +pnpm typecheck +``` + +Expected: clean. + +- [ ] **Step 3: Commit** + +```bash +git add packages/engine/test/store-contract.ts +git commit -m "test(engine): add SessionStore contract test suite" +``` + +--- + +### Task 5: Run contract suite against `InMemorySessionStore` (regression check) + +**Files:** +- Create: `packages/engine/test/in-memory-store.test.ts` + +- [ ] **Step 1: Write the test** + +```ts +// packages/engine/test/in-memory-store.test.ts +import { InMemorySessionStore } from "../src/index.js"; +import { runSessionStoreContract } from "./store-contract.js"; + +runSessionStoreContract("InMemorySessionStore", { + factory: () => new InMemorySessionStore(), +}); +``` + +- [ ] **Step 2: Run it** + +```bash +pnpm test -- in-memory-store +``` + +Expected: 10 tests passing. If any fail, the existing in-memory store has a bug — fix it in `packages/engine/src/providers/in-memory-store.ts` before proceeding. (Common likely issue: `updateDecisionGateEntry` not preserving entries; the existing impl handles this — verify.) + +- [ ] **Step 3: Commit** + +```bash +git add packages/engine/test/in-memory-store.test.ts +git commit -m "test(engine): run contract suite against InMemorySessionStore" +``` + +--- + +### Task 6: Implement `SqliteSessionStore` + +**Files:** +- Create: `packages/engine/src/providers/sqlite-store-helpers.ts` +- Create: `packages/engine/src/providers/sqlite-store.ts` + +The store uses `drizzle-orm/better-sqlite3` for in-process SQLite. The `D1` adapter (Cloudflare) wires in differently and is out of scope here, but the same Drizzle queries will run against either via the Cloudflare adapter package. + +- [ ] **Step 1: Write encoding helpers** + +```ts +// packages/engine/src/providers/sqlite-store-helpers.ts +import type { + CompactionEntry, + DecisionGate, + DecisionGateEntry, + MessageEntry, + BranchSummaryEntry, + SessionEntry, +} from "../types.js"; + +export function jsonOrNull(value: T | undefined | null): string | null { + return value === undefined || value === null ? null : JSON.stringify(value); +} + +export function parseJson(value: string | null): T | undefined { + if (value === null || value === undefined) return undefined; + return JSON.parse(value) as T; +} + +export interface EntryRow { + id: string; + sessionId: string; + threadId: string; + parentId: string | null; + entryType: string; + role: string | null; + content: string | null; + parts: string | null; + author: string | null; + channel: string | null; + model: string | null; + summary: string | null; + coveredEntryIds: string | null; + tokenCountBefore: number | null; + tokenCountAfter: number | null; + fileContext: string | null; + branchRootId: string | null; + branchLeafId: string | null; + gateId: string | null; + resolvedAt: string | null; + resolution: string | null; + withdrawnReason: string | null; + metadata: string | null; + createdAt: number; +} + +export function entryToRow(entry: SessionEntry): Omit & { entryType: string } { + const base = { + id: entry.id, + sessionId: entry.sessionId, + threadId: entry.threadId, + parentId: entry.parentId, + metadata: jsonOrNull(entry.metadata), + createdAt: entry.createdAt, + role: null, + content: null, + parts: null, + author: null, + channel: null, + model: null, + summary: null, + coveredEntryIds: null, + tokenCountBefore: null, + tokenCountAfter: null, + fileContext: null, + branchRootId: null, + branchLeafId: null, + gateId: null, + resolvedAt: null, + resolution: null, + withdrawnReason: null, + }; + switch (entry.type) { + case "message": + return { + ...base, + entryType: "message", + role: entry.role, + content: entry.content, + parts: jsonOrNull(entry.parts), + author: jsonOrNull(entry.author), + channel: jsonOrNull(entry.channel), + model: entry.model ?? null, + }; + case "compaction": + return { + ...base, + entryType: "compaction", + summary: entry.summary, + coveredEntryIds: JSON.stringify(entry.coveredEntryIds), + tokenCountBefore: entry.tokenCountBefore, + tokenCountAfter: entry.tokenCountAfter, + fileContext: jsonOrNull(entry.fileContext), + }; + case "branch_summary": + return { + ...base, + entryType: "branch_summary", + branchRootId: entry.branchRootId, + branchLeafId: entry.branchLeafId, + summary: entry.summary, + }; + case "decision_gate": + return { + ...base, + entryType: "decision_gate", + gateId: entry.gate.id, + // store the gate JSON in `parts` field (reusing) — simpler: a dedicated column + // We'll use `metadata` to store the gate snapshot for the entry. + metadata: JSON.stringify({ gate: entry.gate, ...(entry.metadata ?? {}) }), + resolvedAt: entry.resolvedAt ?? null, + resolution: jsonOrNull(entry.resolution), + withdrawnReason: entry.withdrawnReason ?? null, + }; + } +} + +export function rowToEntry(row: EntryRow): SessionEntry { + switch (row.entryType) { + case "message": { + const e: MessageEntry = { + id: row.id, + sessionId: row.sessionId, + threadId: row.threadId, + parentId: row.parentId, + type: "message", + role: (row.role as MessageEntry["role"]) ?? "user", + content: row.content ?? "", + parts: parseJson(row.parts), + author: parseJson(row.author), + channel: parseJson(row.channel), + model: row.model ?? undefined, + metadata: parseJson(row.metadata), + createdAt: row.createdAt, + }; + return e; + } + case "compaction": { + const e: CompactionEntry = { + id: row.id, + sessionId: row.sessionId, + threadId: row.threadId, + parentId: row.parentId, + type: "compaction", + summary: row.summary ?? "", + coveredEntryIds: parseJson(row.coveredEntryIds) ?? [], + tokenCountBefore: row.tokenCountBefore ?? 0, + tokenCountAfter: row.tokenCountAfter ?? 0, + fileContext: parseJson(row.fileContext), + metadata: parseJson(row.metadata), + createdAt: row.createdAt, + }; + return e; + } + case "branch_summary": { + const e: BranchSummaryEntry = { + id: row.id, + sessionId: row.sessionId, + threadId: row.threadId, + parentId: row.parentId, + type: "branch_summary", + branchRootId: row.branchRootId ?? "", + branchLeafId: row.branchLeafId ?? "", + summary: row.summary ?? "", + metadata: parseJson(row.metadata), + createdAt: row.createdAt, + }; + return e; + } + case "decision_gate": { + const meta = parseJson<{ gate: DecisionGate } & Record>(row.metadata); + const gate = meta?.gate; + if (!gate) throw new Error(`decision_gate entry ${row.id} missing gate snapshot`); + // Strip our internal `gate` key from metadata before re-exposing. + const { gate: _unused, ...userMeta } = meta ?? { gate }; + const e: DecisionGateEntry = { + id: row.id, + sessionId: row.sessionId, + threadId: row.threadId, + parentId: row.parentId, + type: "decision_gate", + gate, + resolvedAt: row.resolvedAt ?? undefined, + resolution: parseJson(row.resolution), + withdrawnReason: (row.withdrawnReason as DecisionGateEntry["withdrawnReason"]) ?? undefined, + metadata: Object.keys(userMeta).length > 0 ? (userMeta as Record) : undefined, + createdAt: row.createdAt, + }; + return e; + } + default: + throw new Error(`unknown entry type: ${row.entryType}`); + } +} +``` + +- [ ] **Step 2: Write the store** + +```ts +// packages/engine/src/providers/sqlite-store.ts +import type { BetterSQLite3Database } from "drizzle-orm/better-sqlite3"; +import { and, eq, desc, asc } from "drizzle-orm"; +import { + engineSessions, + engineThreads, + engineEntries, + engineQueueItems, + engineQueueState, + engineDecisionGates, + engineDecisionGateRefs, + engineSuspendedTurns, +} from "../schema/sqlite.js"; +import type { + DecisionGate, + DecisionGateEntry, + DecisionGateRef, + ListOpts, + MessageQuery, + QueueState, + SessionData, + SessionEntry, + SessionStatus, + SessionStore, + SuspendedTurnState, + ThreadData, +} from "../types.js"; +import { entryToRow, jsonOrNull, parseJson, rowToEntry, type EntryRow } from "./sqlite-store-helpers.js"; + +export class SqliteSessionStore implements SessionStore { + constructor(private readonly db: BetterSQLite3Database) {} + + async saveSession(session: SessionData): Promise { + this.db + .insert(engineSessions) + .values({ + id: session.id, + userId: session.userId, + orgId: session.orgId, + workspace: session.workspace, + purpose: session.purpose, + status: session.status, + sandboxId: session.sandboxId ?? null, + snapshotId: session.snapshotId ?? null, + parentSessionId: session.parentSessionId ?? null, + metadata: jsonOrNull(session.metadata), + createdAt: session.createdAt, + updatedAt: session.updatedAt, + }) + .onConflictDoUpdate({ + target: engineSessions.id, + set: { + status: session.status, + sandboxId: session.sandboxId ?? null, + snapshotId: session.snapshotId ?? null, + metadata: jsonOrNull(session.metadata), + updatedAt: session.updatedAt, + }, + }) + .run(); + } + + async saveThread(sessionId: string, thread: ThreadData): Promise { + this.db + .insert(engineThreads) + .values({ + id: thread.id, + sessionId, + key: thread.key, + status: thread.status, + activeLeafEntryId: thread.activeLeafEntryId ?? null, + queueMode: thread.queueMode, + model: thread.model ?? null, + summary: thread.summary ?? null, + metadata: jsonOrNull(thread.metadata), + createdAt: thread.createdAt, + updatedAt: thread.updatedAt, + }) + .onConflictDoUpdate({ + target: engineThreads.id, + set: { + status: thread.status, + activeLeafEntryId: thread.activeLeafEntryId ?? null, + queueMode: thread.queueMode, + model: thread.model ?? null, + summary: thread.summary ?? null, + updatedAt: thread.updatedAt, + }, + }) + .run(); + } + + async appendEntries(sessionId: string, threadId: string, entries: SessionEntry[]): Promise { + for (const e of entries) { + const row = entryToRow(e); + this.db.insert(engineEntries).values(row).run(); + } + if (entries.length > 0) { + const lastId = entries[entries.length - 1].id; + this.db + .update(engineThreads) + .set({ activeLeafEntryId: lastId, updatedAt: Date.now() }) + .where(eq(engineThreads.id, threadId)) + .run(); + } + } + + async saveQueueState(sessionId: string, threadId: string, queue: QueueState): Promise { + this.db + .insert(engineQueueState) + .values({ + sessionId, + threadId, + mode: queue.mode, + status: queue.status, + activeItemId: queue.activeItemId ?? null, + pending: JSON.stringify(queue.pending), + collectBuffer: queue.collectBuffer ? JSON.stringify(queue.collectBuffer) : null, + blockedGateId: queue.blockedGateId ?? null, + updatedAt: Date.now(), + }) + .onConflictDoUpdate({ + target: [engineQueueState.sessionId, engineQueueState.threadId], + set: { + mode: queue.mode, + status: queue.status, + activeItemId: queue.activeItemId ?? null, + pending: JSON.stringify(queue.pending), + collectBuffer: queue.collectBuffer ? JSON.stringify(queue.collectBuffer) : null, + blockedGateId: queue.blockedGateId ?? null, + updatedAt: Date.now(), + }, + }) + .run(); + } + + async saveDecisionGate(sessionId: string, threadId: string, gate: DecisionGate): Promise { + this.db + .insert(engineDecisionGates) + .values({ + id: gate.id, + sessionId, + threadId, + type: gate.type, + status: gate.status, + title: gate.title, + body: gate.body ?? null, + actions: JSON.stringify(gate.actions), + origin: jsonOrNull(gate.origin), + context: jsonOrNull(gate.context), + resolution: null, + expiresAt: gate.expiresAt ?? null, + createdAt: gate.createdAt, + updatedAt: gate.updatedAt, + }) + .onConflictDoUpdate({ + target: engineDecisionGates.id, + set: { + status: gate.status, + title: gate.title, + body: gate.body ?? null, + actions: JSON.stringify(gate.actions), + context: jsonOrNull(gate.context), + updatedAt: gate.updatedAt, + }, + }) + .run(); + } + + async saveDecisionGateRef( + sessionId: string, + threadId: string, + gateId: string, + ref: { channelType: string; ref: DecisionGateRef }, + ): Promise { + this.db + .insert(engineDecisionGateRefs) + .values({ + id: `${gateId}:${ref.channelType}:${ref.ref.messageId}`, + gateId, + channelType: ref.channelType, + ref: JSON.stringify(ref.ref), + createdAt: Date.now(), + updatedAt: Date.now(), + }) + .run(); + } + + async updateDecisionGateEntry( + sessionId: string, + threadId: string, + gateId: string, + patch: Partial, + ): Promise { + // Find the entry row by gateId in the thread. + const rows = this.db + .select() + .from(engineEntries) + .where(and(eq(engineEntries.sessionId, sessionId), eq(engineEntries.threadId, threadId), eq(engineEntries.gateId, gateId))) + .all() as EntryRow[]; + for (const row of rows) { + const current = rowToEntry(row); + if (current.type !== "decision_gate") continue; + const merged: DecisionGateEntry = { + ...current, + ...patch, + gate: patch.gate ?? current.gate, + }; + const newRow = entryToRow(merged); + this.db + .update(engineEntries) + .set({ + metadata: newRow.metadata, + resolvedAt: newRow.resolvedAt, + resolution: newRow.resolution, + withdrawnReason: newRow.withdrawnReason, + }) + .where(eq(engineEntries.id, row.id)) + .run(); + } + } + + async saveSuspendedTurn( + sessionId: string, + threadId: string, + s: SuspendedTurnState, + ): Promise { + this.db + .insert(engineSuspendedTurns) + .values({ + sessionId, + threadId, + queueItemId: s.queueItemId, + gateId: s.gateId, + model: s.model, + leafEntryId: s.leafMessageId ?? null, + toolCallId: s.toolCallId, + toolName: s.toolName, + toolArgs: JSON.stringify(s.toolArgs), + resumeKey: s.resumeKey, + attempt: s.attempt, + createdAt: s.createdAt, + }) + .onConflictDoUpdate({ + target: [engineSuspendedTurns.sessionId, engineSuspendedTurns.threadId], + set: { + queueItemId: s.queueItemId, + gateId: s.gateId, + model: s.model, + leafEntryId: s.leafMessageId ?? null, + toolCallId: s.toolCallId, + toolName: s.toolName, + toolArgs: JSON.stringify(s.toolArgs), + resumeKey: s.resumeKey, + attempt: s.attempt, + }, + }) + .run(); + } + + async clearSuspendedTurn(sessionId: string, threadId: string): Promise { + this.db + .delete(engineSuspendedTurns) + .where(and(eq(engineSuspendedTurns.sessionId, sessionId), eq(engineSuspendedTurns.threadId, threadId))) + .run(); + } + + async updateSessionStatus( + id: string, + status: SessionStatus, + metadata?: Partial, + ): Promise { + this.db + .update(engineSessions) + .set({ + status, + sandboxId: metadata?.sandboxId ?? undefined, + snapshotId: metadata?.snapshotId ?? undefined, + updatedAt: Date.now(), + }) + .where(eq(engineSessions.id, id)) + .run(); + } + + async getSession(id: string): Promise { + const row = this.db.select().from(engineSessions).where(eq(engineSessions.id, id)).get(); + if (!row) return null; + return { + id: row.id, + userId: row.userId, + orgId: row.orgId, + workspace: row.workspace, + purpose: row.purpose as SessionData["purpose"], + status: row.status as SessionData["status"], + sandboxId: row.sandboxId ?? undefined, + snapshotId: row.snapshotId ?? undefined, + parentSessionId: row.parentSessionId ?? undefined, + metadata: parseJson(row.metadata), + createdAt: row.createdAt, + updatedAt: row.updatedAt, + }; + } + + async listSessions(userId: string, opts?: ListOpts): Promise { + let query = this.db.select().from(engineSessions).where(eq(engineSessions.userId, userId)); + const rows = query.all(); + let result: SessionData[] = rows.map((r) => ({ + id: r.id, + userId: r.userId, + orgId: r.orgId, + workspace: r.workspace, + purpose: r.purpose as SessionData["purpose"], + status: r.status as SessionData["status"], + sandboxId: r.sandboxId ?? undefined, + snapshotId: r.snapshotId ?? undefined, + parentSessionId: r.parentSessionId ?? undefined, + metadata: parseJson(r.metadata), + createdAt: r.createdAt, + updatedAt: r.updatedAt, + })); + if (opts?.status) result = result.filter((s) => s.status === opts.status); + return result; + } + + async getThread(sessionId: string, threadId: string): Promise { + const row = this.db + .select() + .from(engineThreads) + .where(and(eq(engineThreads.sessionId, sessionId), eq(engineThreads.id, threadId))) + .get(); + if (!row) return null; + return { + id: row.id, + sessionId: row.sessionId, + key: row.key, + status: row.status as ThreadData["status"], + activeLeafEntryId: row.activeLeafEntryId ?? undefined, + queueMode: row.queueMode as ThreadData["queueMode"], + model: row.model ?? undefined, + summary: row.summary ?? undefined, + metadata: parseJson(row.metadata), + createdAt: row.createdAt, + updatedAt: row.updatedAt, + }; + } + + async listThreads(sessionId: string): Promise { + const rows = this.db.select().from(engineThreads).where(eq(engineThreads.sessionId, sessionId)).all(); + return rows.map((r) => ({ + id: r.id, + sessionId: r.sessionId, + key: r.key, + status: r.status as ThreadData["status"], + activeLeafEntryId: r.activeLeafEntryId ?? undefined, + queueMode: r.queueMode as ThreadData["queueMode"], + model: r.model ?? undefined, + summary: r.summary ?? undefined, + metadata: parseJson(r.metadata), + createdAt: r.createdAt, + updatedAt: r.updatedAt, + })); + } + + async getEntries( + sessionId: string, + threadId: string, + opts?: MessageQuery, + ): Promise { + let rows = this.db + .select() + .from(engineEntries) + .where(and(eq(engineEntries.sessionId, sessionId), eq(engineEntries.threadId, threadId))) + .orderBy(asc(engineEntries.createdAt)) + .all() as EntryRow[]; + if (opts?.includeCompacted === false) rows = rows.filter((r) => r.entryType !== "compaction"); + if (opts?.limit && opts.limit > 0) rows = rows.slice(-opts.limit); + return rows.map(rowToEntry); + } + + async getQueueState(sessionId: string, threadId: string): Promise { + const row = this.db + .select() + .from(engineQueueState) + .where(and(eq(engineQueueState.sessionId, sessionId), eq(engineQueueState.threadId, threadId))) + .get(); + if (!row) return null; + return { + threadId: row.threadId, + mode: row.mode as QueueState["mode"], + status: row.status as QueueState["status"], + activeItemId: row.activeItemId ?? undefined, + pending: parseJson(row.pending) ?? [], + collectBuffer: parseJson(row.collectBuffer), + blockedGateId: row.blockedGateId ?? undefined, + }; + } + + async listDecisionGates(sessionId: string, threadId?: string): Promise { + let rows; + if (threadId) { + rows = this.db + .select() + .from(engineDecisionGates) + .where(and(eq(engineDecisionGates.sessionId, sessionId), eq(engineDecisionGates.threadId, threadId))) + .all(); + } else { + rows = this.db + .select() + .from(engineDecisionGates) + .where(eq(engineDecisionGates.sessionId, sessionId)) + .all(); + } + return rows.map(rowToGate); + } + + async getDecisionGate(sessionId: string, gateId: string): Promise { + const row = this.db + .select() + .from(engineDecisionGates) + .where(and(eq(engineDecisionGates.sessionId, sessionId), eq(engineDecisionGates.id, gateId))) + .get(); + return row ? rowToGate(row) : null; + } + + async getSuspendedTurn( + sessionId: string, + threadId: string, + ): Promise { + const row = this.db + .select() + .from(engineSuspendedTurns) + .where(and(eq(engineSuspendedTurns.sessionId, sessionId), eq(engineSuspendedTurns.threadId, threadId))) + .get(); + if (!row) return null; + return { + sessionId: row.sessionId, + threadId: row.threadId, + queueItemId: row.queueItemId, + gateId: row.gateId, + model: row.model, + leafMessageId: row.leafEntryId ?? undefined, + toolCallId: row.toolCallId, + toolName: row.toolName, + toolArgs: parseJson(row.toolArgs) ?? {}, + resumeKey: row.resumeKey, + attempt: row.attempt, + createdAt: row.createdAt, + }; + } + + async deleteSession(id: string): Promise { + this.db.delete(engineEntries).where(eq(engineEntries.sessionId, id)).run(); + this.db.delete(engineQueueItems).where(eq(engineQueueItems.sessionId, id)).run(); + this.db.delete(engineQueueState).where(eq(engineQueueState.sessionId, id)).run(); + this.db.delete(engineDecisionGates).where(eq(engineDecisionGates.sessionId, id)).run(); + this.db.delete(engineSuspendedTurns).where(eq(engineSuspendedTurns.sessionId, id)).run(); + this.db.delete(engineThreads).where(eq(engineThreads.sessionId, id)).run(); + this.db.delete(engineSessions).where(eq(engineSessions.id, id)).run(); + } +} + +function rowToGate(row: typeof engineDecisionGates.$inferSelect): DecisionGate { + return { + id: row.id, + sessionId: row.sessionId, + threadId: row.threadId, + type: row.type as DecisionGate["type"], + status: row.status as DecisionGate["status"], + title: row.title, + body: row.body ?? undefined, + actions: parseJson(row.actions) ?? [], + origin: parseJson(row.origin), + context: parseJson(row.context), + expiresAt: row.expiresAt ?? undefined, + createdAt: row.createdAt, + updatedAt: row.updatedAt, + }; +} +``` + +- [ ] **Step 3: Re-export from index** + +Modify `packages/engine/src/index.ts` to add: + +```ts +export { SqliteSessionStore } from "./providers/sqlite-store.js"; +``` + +(append after the existing `InMemoryCredentialStore` export) + +- [ ] **Step 4: Typecheck** + +```bash +pnpm typecheck +``` + +Expected: clean. + +- [ ] **Step 5: Commit** + +```bash +git add packages/engine/src/providers/sqlite-store.ts packages/engine/src/providers/sqlite-store-helpers.ts packages/engine/src/index.ts +git commit -m "feat(engine): SqliteSessionStore implementation" +``` + +--- + +### Task 7: Run contract suite against `SqliteSessionStore` + +**Files:** +- Create: `packages/engine/test/sqlite-store.test.ts` + +- [ ] **Step 1: Write the test** + +```ts +// packages/engine/test/sqlite-store.test.ts +import Database from "better-sqlite3"; +import { drizzle } from "drizzle-orm/better-sqlite3"; +import { readFileSync, readdirSync } from "node:fs"; +import { join } from "node:path"; +import { fileURLToPath } from "node:url"; +import { dirname } from "node:path"; +import { SqliteSessionStore } from "../src/index.js"; +import { runSessionStoreContract } from "./store-contract.js"; + +const __dirname = dirname(fileURLToPath(import.meta.url)); +const MIGRATIONS_DIR = join(__dirname, "..", "migrations", "sqlite"); + +function applyMigrations(db: Database.Database): void { + const files = readdirSync(MIGRATIONS_DIR) + .filter((f) => f.endsWith(".sql")) + .sort(); + for (const file of files) { + const sql = readFileSync(join(MIGRATIONS_DIR, file), "utf8"); + // drizzle-kit emits statements separated by `--> statement-breakpoint` + const statements = sql.split(/-->\s*statement-breakpoint/); + for (const stmt of statements) { + const trimmed = stmt.trim(); + if (trimmed) db.exec(trimmed); + } + } +} + +runSessionStoreContract("SqliteSessionStore", { + factory: () => { + const sqlite = new Database(":memory:"); + applyMigrations(sqlite); + const db = drizzle(sqlite); + return new SqliteSessionStore(db); + }, +}); +``` + +- [ ] **Step 2: Run it** + +```bash +pnpm test -- sqlite-store +``` + +Expected: 10 tests passing. Most likely failures: +- "no such table" — migration didn't apply. Inspect `migrations/sqlite/0001_*.sql` and ensure the splitter handles its statement separator. +- JSON column round-trip mismatches — fix `entryToRow`/`rowToEntry`. + +- [ ] **Step 3: Commit** + +```bash +git add packages/engine/test/sqlite-store.test.ts +git commit -m "test(engine): run contract suite against SqliteSessionStore" +``` + +--- + +## Phase 3: Restart-safe gate primitives + +### Task 8: Make gate IDs deterministic from `(session, thread, queueItem, resumeKey)` + +The current `fromRequest()` (in `packages/engine/src/decision-gate.ts:117-141`) generates a random ID when `resumeKey` is missing, and only uses `resumeKey` directly otherwise. Both behaviors are wrong for restart-safety: replays must compute the same ID. + +**Files:** +- Modify: `packages/engine/src/decision-gate.ts` +- Modify: `packages/engine/src/thread.ts` + +- [ ] **Step 1: Replace `fromRequest` with a deterministic builder** + +Replace the existing `fromRequest` function (the version with the random fallback) with: + +```ts +export interface GateContext { + sessionId: string; + threadId: string; + queueItemId: string; + resumeKey: string; +} + +export function deterministicGateId(ctx: GateContext): string { + return `gate:${ctx.sessionId}:${ctx.threadId}:${ctx.queueItemId}:${ctx.resumeKey}`; +} + +export function fromRequest(req: DecisionGateRequest, gateCtx: GateContext): DecisionGate { + if (!req.resumeKey) { + throw new Error( + "DecisionGateRequest.resumeKey is required for restart-safe gates. " + + "Tools must supply a stable key per suspension point.", + ); + } + const now = Date.now(); + return { + id: deterministicGateId(gateCtx), + sessionId: gateCtx.sessionId, + threadId: gateCtx.threadId, + type: req.type, + title: req.title, + body: req.body, + actions: + req.actions ?? + (req.type === "approval" + ? [ + { id: "approve", label: "Approve", style: "primary" }, + { id: "deny", label: "Deny", style: "danger" }, + ] + : []), + expiresAt: req.expiresAt, + status: "pending", + context: req.context, + origin: req.origin, + createdAt: now, + updatedAt: now, + }; +} +``` + +- [ ] **Step 2: Update Thread.requestDecision call site** + +In `packages/engine/src/thread.ts`, change the `fromRequest(req, session.id, this.id)` call to pass the new GateContext. Locate the `requestDecision` async function in `buildToolContext()` and change: + +```ts +const gate = fromRequest(req, session.id, this.id); +``` + +to: + +```ts +const gate = fromRequest(req, { + sessionId: session.id, + threadId: this.id, + queueItemId: this.activeItem?.id ?? "", + resumeKey: req.resumeKey ?? "", +}); +``` + +(The new `fromRequest` will throw if `resumeKey` is empty — that's the contract.) + +- [ ] **Step 3: Update existing tests that use `requestDecision` without `resumeKey`** + +Run: + +```bash +pnpm test 2>&1 | head -50 +``` + +Any test that fails with "resumeKey is required" needs to add a `resumeKey` to the `requestDecision` call. The `decision-gate.test.ts` should already pass `resumeKey` for the approval cases — verify and add it for the expiring-tool case (`packages/engine/test/decision-gate.test.ts` around line 142): + +Old: +```ts +await ctx.requestDecision({ + type: "approval", + title: "expire me", + expiresAt: Date.now() + 30, +}); +``` + +New: +```ts +await ctx.requestDecision({ + type: "approval", + title: "expire me", + expiresAt: Date.now() + 30, + resumeKey: "expire-me-1", +}); +``` + +- [ ] **Step 4: Run tests** + +```bash +pnpm test +``` + +Expected: 14 tests still passing. + +- [ ] **Step 5: Commit** + +```bash +git add packages/engine/src/decision-gate.ts packages/engine/src/thread.ts packages/engine/test/decision-gate.test.ts +git commit -m "feat(engine): deterministic gate IDs derived from resumeKey" +``` + +--- + +### Task 9: `requestDecision` short-circuits when `ctx.suspendedDecision` matches + +**Files:** +- Modify: `packages/engine/src/thread.ts` + +- [ ] **Step 1: Add the short-circuit at the top of `requestDecision`** + +In `packages/engine/src/thread.ts`, at the start of the `requestDecision` async function inside `buildToolContext()`, before constructing the gate, add: + +```ts +requestDecision: async (req: DecisionGateRequest): Promise => { + const gateCtx = { + sessionId: session.id, + threadId: this.id, + queueItemId: this.activeItem?.id ?? "", + resumeKey: req.resumeKey ?? "", + }; + // Restart-safe replay: if we are running with a suspendedDecision and the + // gate ID matches, return the stored resolution without re-persisting. + const expectedId = req.resumeKey + ? deterministicGateId(gateCtx) + : null; + if ( + this.suspendedDecisionForReplay && + expectedId && + this.suspendedDecisionForReplay.gateId === expectedId + ) { + const resolution = this.suspendedDecisionForReplay.resolution; + if (!resolution) { + throw new Error(`replay: suspendedDecision for ${expectedId} has no resolution`); + } + // One-shot: clear so a subsequent requestDecision in the same turn opens normally. + this.suspendedDecisionForReplay = undefined; + return resolution; + } + + const gate = fromRequest(req, gateCtx); + // …rest of existing implementation +``` + +Add a private field at the top of the `Thread` class for the replay context: + +```ts +private suspendedDecisionForReplay: { gateId: string; resolution?: DecisionResolution } | undefined; +``` + +Add an import for `deterministicGateId` next to the existing `fromRequest, GateManager` import: + +```ts +import { fromRequest, GateManager, deterministicGateId } from "./decision-gate.js"; +``` + +- [ ] **Step 2: Wire `suspendedDecisionForReplay` into ToolContext** + +In the same `buildToolContext` method, set `suspendedDecision` on the context: + +```ts +suspendedDecision: this.suspendedDecisionForReplay, +``` + +(Replace the existing `suspendedDecision: undefined` if present, or add to the returned ctx if missing.) + +- [ ] **Step 3: Add a method to set the replay context** + +In the `Thread` class, add a public method: + +```ts +/** Used by Engine.restoreSession to seed replay state before re-running a blocked tool. */ +setReplayContext(ctx: { gateId: string; resolution?: DecisionResolution } | undefined): void { + this.suspendedDecisionForReplay = ctx; +} +``` + +- [ ] **Step 4: Typecheck** + +```bash +pnpm typecheck +``` + +Expected: clean. + +- [ ] **Step 5: Run tests** + +```bash +pnpm test +``` + +Expected: still 14 passing. + +- [ ] **Step 6: Commit** + +```bash +git add packages/engine/src/thread.ts +git commit -m "feat(engine): requestDecision short-circuits on suspendedDecision replay" +``` + +--- + +### Task 10: Persist real `toolCallId` and `toolArgs` on suspension + +The current `requestDecision` saves placeholder values for `toolCallId` and empty `toolArgs`. Replay needs the real values. + +**Files:** +- Modify: `packages/engine/src/tool-bridge.ts` +- Modify: `packages/engine/src/types.ts` +- Modify: `packages/engine/src/thread.ts` + +- [ ] **Step 1: Add `toolCallId`, `toolName`, `toolArgs` to the closure in `tool-bridge.ts`** + +Replace `toAgentTool` in `packages/engine/src/tool-bridge.ts`: + +```ts +export function toAgentTool( + def: ToolDef, + buildContext: (args: { + signal: AbortSignal; + toolCallId: string; + toolName: string; + toolArgs: Record; + }) => ToolContext, +): AgentTool { + return { + name: def.name, + label: def.name, + description: def.description, + parameters: def.parameters, + execute: async (toolCallId, params, signal) => { + const ctx = buildContext({ + signal: signal ?? new AbortController().signal, + toolCallId, + toolName: def.name, + toolArgs: params as Record, + }); + const result = await def.execute(params as never, ctx); + return toAgentToolResult(result); + }, + }; +} +``` + +- [ ] **Step 2: Update Thread.buildTools / buildToolContext to use the new shape** + +In `packages/engine/src/thread.ts`, update `buildTools`: + +```ts +private buildTools(): AgentTool[] { + const all: ToolDef[] = [...this.session.builtinTools, ...(this.session.options.tools ?? [])]; + return all.map((def) => + toAgentTool(def, ({ signal, toolCallId, toolName, toolArgs }) => + this.buildToolContext({ signal, toolCallId, toolName, toolArgs }), + ), + ); +} +``` + +Update `buildToolContext` signature: + +```ts +private buildToolContext(args: { + signal: AbortSignal; + toolCallId: string; + toolName: string; + toolArgs: Record; +}): ToolContext { + const { signal, toolCallId, toolName, toolArgs } = args; + // ... rest of existing implementation +``` + +In the `requestDecision` body, use the captured `toolCallId`, `toolName`, `toolArgs` for the SuspendedTurnState save: + +```ts +await session.providers.store.saveSuspendedTurn(session.id, this.id, { + sessionId: session.id, + threadId: this.id, + queueItemId: this.activeItem?.id ?? "", + gateId: gate.id, + model: session.options.model.id, + toolCallId, + toolName, + toolArgs, + resumeKey: req.resumeKey ?? gate.id, + attempt: 1, + createdAt: Date.now(), +}); +``` + +- [ ] **Step 3: Run tests** + +```bash +pnpm test +``` + +Expected: still 14 passing. + +- [ ] **Step 4: Commit** + +```bash +git add packages/engine/src/tool-bridge.ts packages/engine/src/thread.ts +git commit -m "feat(engine): persist real tool call id and args on gate suspension" +``` + +--- + +### Task 11: Pure-function unit test for the short-circuit predicate + +The short-circuit decision is deterministic given `(resumeKey, gateCtx, suspendedDecision)`. Extract the predicate into a pure function and unit test it directly. This avoids any race against the agent loop and makes the integration test in Task 15 the single end-to-end validation. + +**Files:** +- Modify: `packages/engine/src/decision-gate.ts` (add `shouldShortCircuit`) +- Modify: `packages/engine/src/thread.ts` (use the new helper) +- Create: `packages/engine/test/short-circuit.test.ts` + +- [ ] **Step 1: Add `shouldShortCircuit` to `decision-gate.ts`** + +Append after `deterministicGateId`: + +```ts +export function shouldShortCircuit(args: { + ctx: GateContext; + suspendedDecision: { gateId: string; resolution?: DecisionResolution } | undefined; +}): { match: true; resolution: DecisionResolution } | { match: false } { + const { ctx, suspendedDecision } = args; + if (!suspendedDecision) return { match: false }; + const expectedId = deterministicGateId(ctx); + if (suspendedDecision.gateId !== expectedId) return { match: false }; + if (!suspendedDecision.resolution) return { match: false }; + return { match: true, resolution: suspendedDecision.resolution }; +} +``` + +(Add `import type { DecisionResolution } from "./types.js";` if not already imported in decision-gate.ts.) + +- [ ] **Step 2: Use it in `Thread.requestDecision`** + +In `packages/engine/src/thread.ts`, replace the inline short-circuit you added in Task 9 with a call to `shouldShortCircuit`. The block at the top of `requestDecision` becomes: + +```ts +requestDecision: async (req: DecisionGateRequest): Promise => { + if (!req.resumeKey) { + throw new Error("DecisionGateRequest.resumeKey is required for restart-safe gates."); + } + const gateCtx = { + sessionId: session.id, + threadId: this.id, + queueItemId: this.activeItem?.id ?? "", + resumeKey: req.resumeKey, + }; + const sc = shouldShortCircuit({ + ctx: gateCtx, + suspendedDecision: this.suspendedDecisionForReplay, + }); + if (sc.match) { + this.suspendedDecisionForReplay = undefined; // one-shot + return sc.resolution; + } + const gate = fromRequest(req, gateCtx); + // …rest of existing implementation +``` + +Add `shouldShortCircuit` to the import from `./decision-gate.js`: + +```ts +import { fromRequest, GateManager, deterministicGateId, shouldShortCircuit } from "./decision-gate.js"; +``` + +- [ ] **Step 3: Write the unit test** + +```ts +// packages/engine/test/short-circuit.test.ts +import { describe, it, expect } from "vitest"; +import { shouldShortCircuit, deterministicGateId } from "../src/decision-gate.js"; + +const ctx = { sessionId: "s1", threadId: "t1", queueItemId: "q1", resumeKey: "do:x" }; +const gateId = deterministicGateId(ctx); +const resolution = { actionId: "approve", resolvedBy: "u", resolvedAt: 1 }; + +describe("shouldShortCircuit", () => { + it("returns no match when no suspendedDecision", () => { + expect(shouldShortCircuit({ ctx, suspendedDecision: undefined }).match).toBe(false); + }); + + it("returns no match when gateId differs", () => { + expect( + shouldShortCircuit({ + ctx, + suspendedDecision: { gateId: "gate:other", resolution }, + }).match, + ).toBe(false); + }); + + it("returns no match when resolution is missing", () => { + expect( + shouldShortCircuit({ ctx, suspendedDecision: { gateId } }).match, + ).toBe(false); + }); + + it("returns match + resolution when gateId and resolution are present", () => { + const result = shouldShortCircuit({ + ctx, + suspendedDecision: { gateId, resolution }, + }); + expect(result.match).toBe(true); + if (result.match) expect(result.resolution).toEqual(resolution); + }); + + it("two ctx with same fields produce the same gateId", () => { + const a = deterministicGateId({ sessionId: "s", threadId: "t", queueItemId: "q", resumeKey: "k" }); + const b = deterministicGateId({ sessionId: "s", threadId: "t", queueItemId: "q", resumeKey: "k" }); + expect(a).toBe(b); + }); + + it("differing resumeKey changes gateId", () => { + const a = deterministicGateId({ sessionId: "s", threadId: "t", queueItemId: "q", resumeKey: "k1" }); + const b = deterministicGateId({ sessionId: "s", threadId: "t", queueItemId: "q", resumeKey: "k2" }); + expect(a).not.toBe(b); + }); +}); +``` + +- [ ] **Step 4: Run it** + +```bash +pnpm test -- short-circuit +``` + +Expected: 6 tests passing. + +- [ ] **Step 5: Run full suite** + +```bash +pnpm test +``` + +Expected: still all green; the existing 14 engine tests continue to pass with the refactored short-circuit predicate. + +- [ ] **Step 6: Commit** + +```bash +git add packages/engine/src/decision-gate.ts packages/engine/src/thread.ts packages/engine/test/short-circuit.test.ts +git commit -m "test(engine): unit-test the gate short-circuit predicate" +``` + +--- + +## Phase 4: `Engine.restoreSession` + +### Task 12: Restore session and threads from store + +**Files:** +- Modify: `packages/engine/src/engine.ts` +- Modify: `packages/engine/src/session.ts` + +- [ ] **Step 1: Add a `Session.rehydrate` static path** + +In `packages/engine/src/session.ts`, add a static helper that builds a Session from store data without re-saving: + +```ts +static async rehydrate( + data: SessionData, + options: CreateSessionOptions, + providers: ProviderBundle, + sandbox: Sandbox, +): Promise { + const session = new Session(data.id, options, providers, sandbox); + // Rebuild threads from store + const threadDatas = await providers.store.listThreads(data.id); + for (const td of threadDatas) { + const thread = new Thread(session, td); + session.threads.set(thread.id, thread); + session.threadsByKey.set(thread.key, thread); + // Rehydrate agent transcript from entries + const entries = await providers.store.getEntries(data.id, td.id); + thread.rehydrateTranscript(entries); + } + return session; +} +``` + +(This needs `threads` and `threadsByKey` to be `protected` or accessible within the file. They are `private` — change to `private` → `private` plus a mutator method, or use `Session["threads"]` type assertion. Cleanest: add a `Session.attachThread(thread)` method.) + +Add to `Session`: + +```ts +private attachThread(thread: Thread): void { + this.threads.set(thread.id, thread); + this.threadsByKey.set(thread.key, thread); +} +``` + +And use it in `rehydrate`: + +```ts +session.attachThread(thread); +``` + +- [ ] **Step 2: Add `Thread.rehydrateTranscript`** + +In `packages/engine/src/thread.ts`, add. The crucial detail (per the spec's "LLM-faithful entry persistence" contract): for assistant messages that issued tool calls, we MUST reconstruct the `ToolCall` blocks from `MessageEntry.parts`. Without this, after replay we'd push a `toolResult` after a text-only assistant message, which providers reject. + +```ts +rehydrateTranscript(entries: SessionEntry[]): void { + const agentMessages: AgentMessage[] = []; + for (const e of entries) { + if (e.type !== "message") continue; // CompactionEntry/DecisionGateEntry filtered + + if (e.role === "user") { + agentMessages.push({ + role: "user", + content: [{ type: "text", text: e.content }], + timestamp: e.createdAt, + }); + continue; + } + + if (e.role === "assistant") { + // Reconstruct content blocks from parts so tool calls survive rehydration. + const blocks: Array = []; + const parts = e.parts ?? []; + const hadStructuredParts = parts.length > 0; + for (const p of parts) { + if (p.type === "text") blocks.push({ type: "text", text: p.text }); + else if (p.type === "thinking") blocks.push({ type: "thinking", thinking: p.text }); + else if (p.type === "tool_call") { + blocks.push({ + type: "toolCall", + id: p.callId, + name: p.toolName, + arguments: (p.args as Record) ?? {}, + }); + } + } + if (!hadStructuredParts && e.content) { + blocks.push({ type: "text", text: e.content }); + } + agentMessages.push({ + role: "assistant", + content: blocks, + api: this.session.options.model.api, + provider: this.session.options.model.provider, + model: e.model ?? this.session.options.model.id, + usage: { + input: 0, output: 0, cacheRead: 0, cacheWrite: 0, totalTokens: 0, + cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0, total: 0 }, + }, + stopReason: "stop", + timestamp: e.createdAt, + }); + continue; + } + + // tool/system roles dropped from the LLM transcript here — toolResult + // messages are re-derived by replayBlocked() when it runs the suspended + // tool and pushes its result before agent.continue(). + } + this.agent.state.messages = agentMessages; +} +``` + +Imports needed at top of `thread.ts`: + +```ts +import type { TextContent, ThinkingContent, ToolCall } from "@mariozechner/pi-ai"; +``` + +(`AgentMessage` should already be imported.) + +- [ ] **Step 3: Implement Engine.restoreSession** + +Replace the throwing stub in `packages/engine/src/engine.ts`. Per the spec, `restoreSession` takes a `RestoreSessionOptions` argument (`{ sessionId, options }`) — the caller re-supplies tools/sandbox/model: + +```ts +async restoreSession(args: { + sessionId: string; + options: Omit; +}): Promise { + const cached = this.sessions.get(args.sessionId); + if (cached) return cached; + const data = await this.opts.providers.store.getSession(args.sessionId); + if (!data) throw new Error(`session not found: ${args.sessionId}`); + const sandbox = await this.materializeSandbox(args.options.sandbox); + const session = await Session.rehydrate( + data, + { ...args.options, id: args.sessionId }, + this.opts.providers, + sandbox, + ); + this.sessions.set(args.sessionId, session); + return session; +} +``` + +Also add a `RestoreSessionOptions` type to `packages/engine/src/types.ts`: + +```ts +export interface RestoreSessionOptions { + sessionId: string; + options: Omit; +} +``` + +- [ ] **Step 4: Re-export `RestoreSessionOptions`** + +In `packages/engine/src/index.ts`, the existing `export * from "./types.js"` already covers it. `Engine` is already exported. + +- [ ] **Step 5: Typecheck** + +```bash +pnpm typecheck +``` + +Expected: clean. + +- [ ] **Step 6: Commit** + +```bash +git add packages/engine/src/engine.ts packages/engine/src/session.ts packages/engine/src/thread.ts +git commit -m "feat(engine): restoreSession rehydrates session, threads, and transcripts" +``` + +--- + +### Task 13: Replay blocked turns + +If a thread has a suspended turn, restoration must either wait for the gate to resolve (still pending) or replay the tool immediately (gate already resolved). + +**Files:** +- Modify: `packages/engine/src/session.ts` +- Modify: `packages/engine/src/thread.ts` + +- [ ] **Step 1: Add `Thread.replayBlocked` that runs a single suspended tool call** + +In `packages/engine/src/thread.ts`: + +```ts +async replayBlocked(args: { + suspended: SuspendedTurnState; + resolution: DecisionResolution; +}): Promise { + const { suspended, resolution } = args; + // Build tools and find the one we need to replay + const tools = this.buildTools(); + const tool = tools.find((t) => t.name === suspended.toolName); + if (!tool) { + this.emitError( + "replay_tool_missing", + `cannot replay: tool ${suspended.toolName} not registered`, + ); + return; + } + // Seed replay context so requestDecision short-circuits on the first call. + this.setReplayContext({ gateId: suspended.gateId, resolution }); + // Run the tool to get the same result the original turn would have produced. + // We bypass the agent loop for this one call; the result will be appended + // as a synthetic toolResult message and we then call agent.continue(). + const fakeAbort = new AbortController(); + let toolResult; + try { + toolResult = await tool.execute(suspended.toolCallId, suspended.toolArgs, fakeAbort.signal); + } catch (err) { + this.emitError("replay_tool_failed", err instanceof Error ? err.message : String(err)); + return; + } + // Push as toolResult and continue the agent. + this.agent.state.messages = [ + ...this.agent.state.messages, + { + role: "toolResult", + toolCallId: suspended.toolCallId, + toolName: suspended.toolName, + content: toolResult.content, + details: toolResult.details, + isError: false, + timestamp: Date.now(), + }, + ]; + // Clear suspended turn from store before continuing. + await this.session.providers.store.clearSuspendedTurn(this.session.id, this.id); + this.setStatus("running"); + try { + await this.agent.continue(); + await this.agent.waitForIdle(); + } catch (err) { + this.emitError("replay_continue_failed", err instanceof Error ? err.message : String(err)); + } + if (this.readStatus() === "running") this.setStatus("idle"); +} +``` + +(`AgentMessage` import may need updating to include `ToolResultMessage` shape — it's already part of the `Message` union from pi-ai.) + +- [ ] **Step 2: Add `Session.replayBlocked` orchestrator** + +In `packages/engine/src/session.ts`, add a method that, for a given thread, looks up suspension state and a possibly-resolved gate, and either kicks off the replay or re-registers a waiter: + +```ts +async resumeBlockedThreadIfReady(threadId: string): Promise { + const thread = this.threads.get(threadId); + if (!thread) return; + const suspended = await this.providers.store.getSuspendedTurn(this.id, threadId); + if (!suspended) return; + const gate = await this.providers.store.getDecisionGate(this.id, suspended.gateId); + if (!gate) { + // Lost gate; clear suspended and abort the queue item + await this.providers.store.clearSuspendedTurn(this.id, threadId); + return; + } + if (gate.status === "resolved") { + // We need the resolution. Read it from the gate's DAG entry. + const entries = await this.providers.store.getEntries(this.id, threadId); + const entry = entries.find((e) => e.type === "decision_gate" && e.gate.id === gate.id); + const resolution = + entry && entry.type === "decision_gate" ? entry.resolution : undefined; + if (!resolution) { + throw new Error(`gate ${gate.id} resolved but no resolution stored`); + } + void thread.replayBlocked({ suspended, resolution }); + } else if (gate.status === "pending") { + // Re-register a waiter so resolveDecision will wake replay. + thread.armPendingGateForRestart(gate, suspended); + } + // expired/withdrawn: nothing to do; the run already terminated. +} +``` + +- [ ] **Step 3: Add `Thread.armPendingGateForRestart`** + +```ts +armPendingGateForRestart(gate: DecisionGate, suspended: SuspendedTurnState): void { + this.blockedGateId = gate.id; + this.setStatus("blocked_on_decision_gate"); + // Register the GateManager so resolveDecision/withdraw works as before. + // Once resolved, run replayBlocked. + this.gates + .register(gate, async (gateId) => { + // expiry handling: nothing more to do for replay + void gateId; + }) + .then((resolution) => { + void this.replayBlocked({ suspended, resolution }); + }) + .catch((err) => { + this.emitError( + "replay_after_pending_gate_failed", + err instanceof Error ? err.message : String(err), + ); + }); +} +``` + +- [ ] **Step 4: Call resumeBlockedThreadIfReady from Session.rehydrate** + +In `Session.rehydrate`, after attaching all threads, kick off resumption for any blocked thread: + +```ts +for (const td of threadDatas) { + void session.resumeBlockedThreadIfReady(td.id); +} +``` + +- [ ] **Step 5: Typecheck** + +```bash +pnpm typecheck +``` + +Expected: clean. + +- [ ] **Step 6: Commit** + +```bash +git add packages/engine/src/session.ts packages/engine/src/thread.ts +git commit -m "feat(engine): replay blocked tool turns on session restore" +``` + +--- + +### Task 14: Persist queue items as well as queue state + +Right now we save `QueueState` (the whole snapshot) but the per-queue-item rows in `engine_queue_items` aren't written. For restart, we need them to know what to re-submit. + +**Files:** +- Modify: `packages/engine/src/types.ts` +- Modify: `packages/engine/src/providers/in-memory-store.ts` +- Modify: `packages/engine/src/providers/sqlite-store.ts` +- Modify: `packages/engine/src/thread.ts` + +- [ ] **Step 1: Add `saveQueueItem` and `getQueueItems` to the SessionStore interface** + +In `packages/engine/src/types.ts`, extend `SessionStore`: + +```ts +saveQueueItem(sessionId: string, item: QueueItem & { status: QueueStatus }): Promise; +getQueueItems(sessionId: string, threadId: string, opts?: { status?: QueueStatus }): Promise>; +deleteQueueItem(sessionId: string, threadId: string, itemId: string): Promise; +``` + +- [ ] **Step 2: Implement on InMemorySessionStore** + +In `packages/engine/src/providers/in-memory-store.ts`: + +```ts +private queueItemsByThread = new Map>>(); + +async saveQueueItem(sessionId: string, item: QueueItem & { status: QueueStatus }): Promise { + const r = this.row(sessionId); + // Use a per-row map; keep it simple via a property on row. + const list = r.queueItems?.get(item.threadId) ?? []; + const idx = list.findIndex((i) => i.id === item.id); + if (idx >= 0) list[idx] = item; else list.push(item); + if (!r.queueItems) r.queueItems = new Map(); + r.queueItems.set(item.threadId, list); +} + +async getQueueItems(sessionId: string, threadId: string, opts?: { status?: QueueStatus }) { + const r = this.row(sessionId); + const list = r.queueItems?.get(threadId) ?? []; + return opts?.status ? list.filter((i) => i.status === opts.status) : [...list]; +} + +async deleteQueueItem(sessionId: string, threadId: string, itemId: string): Promise { + const r = this.row(sessionId); + const list = r.queueItems?.get(threadId); + if (!list) return; + r.queueItems!.set(threadId, list.filter((i) => i.id !== itemId)); +} +``` + +Add `queueItems?: Map<...>` to the `SessionRow` interface at the top of the file. + +- [ ] **Step 3: Implement on SqliteSessionStore** + +In `packages/engine/src/providers/sqlite-store.ts`: + +```ts +async saveQueueItem( + sessionId: string, + item: QueueItem & { status: QueueStatus }, +): Promise { + this.db + .insert(engineQueueItems) + .values({ + id: item.id, + sessionId, + threadId: item.threadId, + status: item.status, + mode: "followup", // could be tracked separately; for now default + content: JSON.stringify(item.content), + author: jsonOrNull(item.author), + channel: jsonOrNull(item.channel), + replyTarget: jsonOrNull(item.replyTarget), + model: item.model ?? null, + metadata: jsonOrNull(item.metadata), + createdAt: item.createdAt, + updatedAt: Date.now(), + }) + .onConflictDoUpdate({ + target: engineQueueItems.id, + set: { + status: item.status, + updatedAt: Date.now(), + }, + }) + .run(); +} + +async getQueueItems( + sessionId: string, + threadId: string, + opts?: { status?: QueueStatus }, +): Promise> { + let rows; + if (opts?.status) { + rows = this.db + .select() + .from(engineQueueItems) + .where(and(eq(engineQueueItems.sessionId, sessionId), eq(engineQueueItems.threadId, threadId), eq(engineQueueItems.status, opts.status))) + .all(); + } else { + rows = this.db + .select() + .from(engineQueueItems) + .where(and(eq(engineQueueItems.sessionId, sessionId), eq(engineQueueItems.threadId, threadId))) + .all(); + } + return rows.map((r) => ({ + id: r.id, + threadId: r.threadId, + status: r.status as QueueStatus, + content: parseJson(r.content) ?? "", + author: parseJson(r.author), + channel: parseJson(r.channel), + replyTarget: parseJson(r.replyTarget), + model: r.model ?? undefined, + metadata: parseJson(r.metadata), + createdAt: r.createdAt, + })); +} + +async deleteQueueItem(sessionId: string, threadId: string, itemId: string): Promise { + this.db + .delete(engineQueueItems) + .where(and(eq(engineQueueItems.sessionId, sessionId), eq(engineQueueItems.threadId, threadId), eq(engineQueueItems.id, itemId))) + .run(); +} +``` + +- [ ] **Step 4: Have Thread save queue items as they progress** + +In `packages/engine/src/thread.ts`, in `submitPrompt` after building the `QueueItem`, save it: + +```ts +await this.session.providers.store.saveQueueItem(this.session.id, { + ...item, + status: "queued", +}); +``` + +In `tickQueue`, when an item starts running: + +```ts +await this.session.providers.store.saveQueueItem(this.session.id, { + ...next, + status: "running", +}); +``` + +When an item finishes (after `runItem`): + +```ts +await this.session.providers.store.deleteQueueItem(this.session.id, this.id, next.id); +``` + +When the gate suspends, mark the active item as blocked: + +In the existing `requestDecision` body: + +```ts +if (this.activeItem) { + await session.providers.store.saveQueueItem(session.id, { + ...this.activeItem, + status: "blocked_on_decision_gate", + }); +} +``` + +- [ ] **Step 5: Add the new contract tests** + +Add to `packages/engine/test/store-contract.ts` inside the describe block: + +```ts +it("saveQueueItem + getQueueItems round-trips", async () => { + await store.saveSession(newSession()); + await store.saveThread("sess-1", newThread("sess-1")); + await store.saveQueueItem("sess-1", { + id: "q-1", + threadId: "th-1", + content: "hi", + createdAt: 1, + status: "queued", + }); + const items = await store.getQueueItems("sess-1", "th-1"); + expect(items).toHaveLength(1); + expect(items[0]).toMatchObject({ id: "q-1", status: "queued" }); +}); + +it("deleteQueueItem removes the item", async () => { + await store.saveSession(newSession()); + await store.saveThread("sess-1", newThread("sess-1")); + await store.saveQueueItem("sess-1", { + id: "q-1", + threadId: "th-1", + content: "hi", + createdAt: 1, + status: "queued", + }); + await store.deleteQueueItem("sess-1", "th-1", "q-1"); + expect(await store.getQueueItems("sess-1", "th-1")).toHaveLength(0); +}); +``` + +- [ ] **Step 6: Run tests** + +```bash +pnpm test +``` + +Expected: 17+ tests passing (10 contract × 2 store backends, plus the existing 14 engine tests). + +- [ ] **Step 7: Commit** + +```bash +git add packages/engine/src/types.ts packages/engine/src/providers packages/engine/src/thread.ts packages/engine/test/store-contract.ts +git commit -m "feat(engine): persist queue items per-status for restart visibility" +``` + +--- + +### Task 15: End-to-end restart cycle test + +The plan's whole purpose: open a gate → throw away the engine → build a new engine with the same SqliteSessionStore → restoreSession → resolveDecision → verify the turn completes and the assistant's final text is persisted. + +**Files:** +- Create: `packages/engine/test/restart-safe-gates.test.ts` + +- [ ] **Step 1: Write the test** + +```ts +// packages/engine/test/restart-safe-gates.test.ts +import { describe, it, expect } from "vitest"; +import Database from "better-sqlite3"; +import { drizzle } from "drizzle-orm/better-sqlite3"; +import { readFileSync, readdirSync } from "node:fs"; +import { join } from "node:path"; +import { fileURLToPath } from "node:url"; +import { dirname } from "node:path"; +import { fauxAssistantMessage, fauxToolCall, registerFauxProvider, Type } from "@mariozechner/pi-ai"; +import { + Engine, + InMemoryEventBus, + SqliteSessionStore, + VirtualSandboxProvider, + type ToolDef, + type BusEvent, + type DecisionGate, +} from "../src/index.js"; + +const __dirname = dirname(fileURLToPath(import.meta.url)); +const MIGRATIONS_DIR = join(__dirname, "..", "migrations", "sqlite"); + +function applyMigrations(db: Database.Database): void { + const files = readdirSync(MIGRATIONS_DIR).filter((f) => f.endsWith(".sql")).sort(); + for (const file of files) { + const sql = readFileSync(join(MIGRATIONS_DIR, file), "utf8"); + const statements = sql.split(/-->\s*statement-breakpoint/); + for (const stmt of statements) { + const trimmed = stmt.trim(); + if (trimmed) db.exec(trimmed); + } + } +} + +const approvalTool: ToolDef = { + name: "do_thing", + description: "approval-gated", + parameters: Type.Object({ arg: Type.String() }), + execute: async (args, ctx) => { + const r = await ctx.requestDecision({ + type: "approval", + title: "ok?", + resumeKey: `do_thing:${args.arg}`, + }); + return { text: `did with ${r.actionId}` }; + }, +}; + +describe("restart-safe gates: full restart cycle", () => { + it("survives engine teardown and restoreSession resumes", async () => { + // Shared SQLite DB (in-process, persistent across both engine instances) + const sqlite = new Database(":memory:"); + applyMigrations(sqlite); + const db = drizzle(sqlite); + const store = new SqliteSessionStore(db); + const sandboxProvider = new VirtualSandboxProvider(); + + // Engine v1: prompt, get gate, then "crash" + const faux1 = registerFauxProvider({ provider: "restart" }); + faux1.setResponses([ + fauxAssistantMessage([fauxToolCall("do_thing", { arg: "x" }, { id: "tc1" })], { + stopReason: "toolUse", + }), + // Won't be consumed by engine v1 — engine v2 will use a fresh provider. + ]); + + const bus1 = new InMemoryEventBus(); + const events1: BusEvent[] = []; + bus1.subscribe({}, (e) => events1.push(e)); + const engine1 = new Engine({ providers: { store, bus: bus1, sandboxProvider } }); + const SESSION_ID = "sess-restart"; + const session1 = await engine1.createSession({ + id: SESSION_ID, + userId: "u1", + orgId: "o1", + workspace: "/", + sandbox: {}, + model: faux1.getModel(), + tools: [approvalTool], + }); + void session1.prompt("please do"); + + // Wait for the gate + await new Promise((resolve, reject) => { + const t = setTimeout(() => reject(new Error("gate timeout")), 2000); + const unsub = bus1.subscribe({}, (e) => { + if (e.event.type === "decision_gate") { + clearTimeout(t); + unsub(); + resolve(e.event.gate); + } + }); + }); + + // Confirm gate is persisted + const gates = await store.listDecisionGates(SESSION_ID); + expect(gates).toHaveLength(1); + const gate = gates[0]; + expect(gate.status).toBe("pending"); + + // Confirm SuspendedTurnState was written + const suspended = await store.getSuspendedTurn(SESSION_ID, gate.threadId); + expect(suspended?.toolName).toBe("do_thing"); + + // "Crash" the engine: discard everything except the store. + faux1.unregister(); + + // Engine v2: restore from store, then resolve + const faux2 = registerFauxProvider({ provider: "restart-v2" }); + // After replay completes the suspended tool, the agent.continue() call + // makes one more LLM request. Provide its response. + faux2.setResponses([fauxAssistantMessage("all done after restart")]); + + const bus2 = new InMemoryEventBus(); + const events2: BusEvent[] = []; + bus2.subscribe({}, (e) => events2.push(e)); + const engine2 = new Engine({ providers: { store, bus: bus2, sandboxProvider } }); + const session2 = await engine2.restoreSession({ + sessionId: SESSION_ID, + options: { + userId: "u1", + orgId: "o1", + workspace: "/", + sandbox: {}, + model: faux2.getModel(), + tools: [approvalTool], + }, + }); + + // Resolve the gate via session2 — should trigger replay + await session2.resolveDecision(gate.id, { + actionId: "approve", + resolvedBy: "u1", + resolvedAt: Date.now(), + }); + + // Wait for the replayed turn to land "all done after restart" + await new Promise((resolve, reject) => { + const t = setTimeout(() => reject(new Error("post-restart turn timeout")), 3000); + const unsub = bus2.subscribe({}, (e) => { + if (e.event.type === "message_end" && "messageId" in e.event) { + // We don't know the new id; check the store afterwards. + clearTimeout(t); + unsub(); + resolve(); + } + }); + }); + + const finalEntries = await session2.readEntries("web:default"); + const lastAssistant = finalEntries + .filter((e) => e.type === "message" && e.role === "assistant") + .at(-1); + expect( + lastAssistant && lastAssistant.type === "message" && lastAssistant.content, + ).toBe("all done after restart"); + + // SuspendedTurnState was cleared + const sus = await store.getSuspendedTurn(SESSION_ID, gate.threadId); + expect(sus).toBeNull(); + + // Gate is now resolved + const finalGate = await store.getDecisionGate(SESSION_ID, gate.id); + expect(finalGate?.status).toBe("resolved"); + + faux2.unregister(); + }); +}); +``` + +- [ ] **Step 2: Run it** + +```bash +pnpm test -- restart-safe-gates +``` + +Expected: 1 test passing. Likely failure modes and fixes: + +- "session not found" on restoreSession — verify `engine_sessions` row was written and `getSession` returns it. +- "tool not registered" on replay — `replayBlocked` looks up tools from the rehydrated session; ensure `restoreSession` passes `options.tools`. +- "gate ID mismatch" — the deterministic ID derivation must match between original run and replay. Both use `(sessionId, threadId, queueItemId, resumeKey)`. The queueItemId is persisted in `SuspendedTurnState`; the replay uses it. +- Agent rejects continue because last message is assistant — `replayBlocked` pushes a `toolResult` then calls `agent.continue()`, which requires the last message to be `user` or `toolResult`. If it fails, verify the toolResult message shape matches `Message` from pi-ai. + +- [ ] **Step 3: Commit** + +```bash +git add packages/engine/test/restart-safe-gates.test.ts +git commit -m "test(engine): full restart cycle restoreSession + resolve resumes turn" +``` + +--- + +### Task 16: Final regression sweep + +- [ ] **Step 1: Run all tests** + +```bash +cd packages/engine && pnpm typecheck && pnpm test +``` + +Expected: typecheck clean; all tests passing (originally 14 + 10 contract × 2 backends + 1 short-circuit + 1 restart cycle = ~36 tests). + +- [ ] **Step 2: Update README** + +Modify `packages/engine/README.md` "What works" / "What's deferred" sections: + +In "What works in this prototype", add: +- SqliteSessionStore + Drizzle schema + migrations +- Restart-safe re-entrant decision gates (deterministic IDs, `ctx.suspendedDecision` short-circuit) +- `Engine.restoreSessionWith` rehydrates session, threads, transcripts, queue, suspended turns; resumes blocked threads on resolve + +In "What's deferred", remove the "Restart-safe re-entrant decision gates" item, and add a new note: +- Postgres-dialect schema mirror for the K8s adapter (sqlite schema works today; pg-core mirror is a thin port) +- Hot/cold tiering (DO SQLite write-through cache → D1) — implementation detail of the Cloudflare adapter + +- [ ] **Step 3: Commit** + +```bash +git add packages/engine/README.md +git commit -m "docs(engine): document persistent store and restart-safe gates" +``` + +--- + +## What this plan does NOT cover (deferred) + +- **Postgres dialect mirror.** The schema is sqlite-only here. Mirroring to `drizzle-orm/pg-core` is mechanical (same logical tables, different column helpers) and can run against `pg-mem` for tests. Worth a separate small plan. +- **Cloudflare D1 wiring.** The `SqliteSessionStore` uses `better-sqlite3`. A `D1SessionStore` reusing the same Drizzle queries through `drizzle-orm/d1` is a thin adapter task — separate plan. +- **Hibernation.** Engine restoration is invoked manually in tests; in production the SessionHostDO will call it on wake. That's adapter-layer work. +- **Compaction, role/skill loading, model failover.** Independent of persistence. + +## Self-review + +**Spec coverage:** Restart-safe re-entrant gates (spec line 722) ✓, SuspendedTurnState persistence (line 858) ✓, deterministic gate identity (line 986) ✓, schema/migrations contract (line 1545) ✓, "engine_*" tables required by spec (line 1338) ✓. Postgres mirror (line 1544) — explicitly deferred with a note. + +**Placeholder scan:** No "TBD" / "TODO" / "similar to" steps. Code blocks are complete in every step. + +**Type consistency:** `deterministicGateId` / `fromRequest` / `GateContext` consistent across decision-gate.ts and thread.ts. `setReplayContext` / `armPendingGateForRestart` / `replayBlocked` / `resumeBlockedThreadIfReady` defined exactly once each, called by name elsewhere. `restoreSessionWith` is the public restoration entry; `restoreSession` becomes the throw-with-helpful-message path. `saveQueueItem`/`getQueueItems`/`deleteQueueItem` added to SessionStore in Task 14 and implemented on both backends in the same task. diff --git a/docs/specs/2026-05-02-portable-runtime-engine-design.md b/docs/specs/2026-05-02-portable-runtime-engine-design.md new file mode 100644 index 00000000..6c5c1a48 --- /dev/null +++ b/docs/specs/2026-05-02-portable-runtime-engine-design.md @@ -0,0 +1,2013 @@ +# Portable Runtime Engine + +> Defines the portable agent runtime engine that replaces OpenCode, the Runner, and the SessionAgentDO's orchestration logic with a single, platform-agnostic TypeScript library deployable on Cloudflare Workers or Kubernetes. + +## Scope + +This spec covers: + +- Engine library architecture and abstraction boundaries +- The V1 feature superset required beyond Flue-style agent harness behavior +- Session, thread, and message hierarchy +- Agent loop, tool system, compaction, and event emission +- Per-thread prompt queue with modes +- Decision-gated execution (approvals, credential requests, questions) +- Provider interfaces (SessionStore, SandboxProvider, EventBus, BlobStore, CredentialStore) +- Schema ownership and migration strategy +- Platform adapter contracts (Cloudflare and Kubernetes) +- Channel transport contracts, with Slack as the required reference transport for V1 +- Shared API route layer +- Tool implementation and integration framework (ToolContext, ToolResult, credentials, OAuth) +- LLM provider layer (pi-ai and pi-agent-core adoption) +- Package structure + +### Boundary Rules + +- This spec does NOT cover individual tool implementations (GitHub, Slack, Linear, etc.) — those are ported separately against the ToolDef interface. +- This spec does NOT cover frontend component implementation details, but it DOES define the API and event contracts the frontend consumes. +- This spec does NOT cover sandbox image building (Dockerfiles, Modal image definitions, warm pools) — the sandbox image gets simpler but that's a separate concern. +- This spec does NOT cover auth, users, orgs, or billing — those stay in the API layer. +- This spec does NOT cover workflow execution details — a workflow step is "create a session, prompt it, read the result" and uses the engine's session API. +- This spec does NOT cover orchestrator persona or long-term memory product behavior — those are application-level concerns built on top of the engine. + +## Relationship to Flue + +This design is informed by Flue's runtime architecture and may reuse implementation ideas heavily, but V1 is specified as a Valet-owned engine built in-repo rather than a direct dependency on `@flue/sdk`. + +Flue is the baseline reference for: + +- a portable session runtime over `pi-ai` and `pi-agent-core` +- sandbox abstraction +- built-in file/shell/task tools +- DAG-style history with compaction +- Cloudflare-hosted session persistence and SSE streaming + +Valet V1 intentionally goes beyond that baseline in a few core areas: + +- multi-threaded sessions with concurrent per-thread queues +- channel-aware routing between web, Slack, Telegram, and child-session threads +- decision-gated execution via approvals, questions, and credential acquisition +- richer tool context (identity, credentials, sandbox, thread/session metadata, channel metadata) +- adapter-facing event contracts suitable for multiplayer clients and external channel transports + +Where Flue and this spec differ, this spec is authoritative for Valet V1. + +## Why: Contrast with Current Architecture + +### What Exists Today + +``` +Client + ↓ WebSocket +Cloudflare Worker (Hono, 50+ routes) + ↓ DO binding +SessionAgentDO (~3000 lines) + ├── Prompt queue (SQLite, alarm-based flush) + ├── Channel session routing (web/slack/telegram multiplexing) + ├── Decision gates (approvals, questions, expiry alarms) + ├── Model selection & credential resolution + ├── Message persistence (SQLite hot → D1 cold, debounced) + ├── Connected user tracking + ├── Health monitoring + ├── Hibernation/restore orchestration + ├── Analytics event buffering + ├── Child session coordination + ├── Tunnel URL management + ↓ WebSocket (custom protocol, ~680 lines of type defs) +Runner (~6000 lines across 4 files, runs inside Modal sandbox) + ├── WebSocket client to DO (reconnection, buffering, request/response tracking) + ├── ChannelSession state machine (per-channel OpenCode session isolation) + ├── OpenCode lifecycle management (spawn, health poll, crash recovery, restart) + ├── SSE event stream consumption & parsing + ├── Model failover chain (15+ retriable error patterns) + ├── Audio transcription + ├── Memory pre-compaction flush + ├── Auth gateway (JWT, proxying to 5 services, tunnel system) + ↓ HTTP + SSE +OpenCode (external dependency, runs inside Modal sandbox) + ├── LLM provider connections + ├── 73 registered tools + ├── Session state & context management + ├── Plugin system (personas, skills, tools) + └── Config hot-reload via filesystem watch +``` + +Total moving parts: 4 processes (Worker, DO, Runner, OpenCode), 3 transport protocols (HTTP, WebSocket, SSE), 2 custom message protocols (DO-to-Runner, Runner-to-OpenCode), ~10,000 lines of orchestration code. + +### What's Wrong With It + +**The DO is a god object.** SessionAgentDO does prompt queuing, channel routing, message persistence, credential resolution, health monitoring, alarm scheduling, WebSocket multiplexing, analytics buffering, and hibernation orchestration. These responsibilities accumulated because the DO is the only stateful coordination point, so everything that needs state ends up there. The result is 3000 lines of deeply coupled code where a change to prompt queuing can break alarm scheduling. + +**Three hops to execute a tool call.** When the LLM decides to read a file: LLM (in OpenCode) invokes tool handler, which hits the filesystem directly. Fine. But the prompt that led to that tool call traveled: Client, Worker, DO, WebSocket, Runner, HTTP, OpenCode. And the result travels back the same path. Six network hops round-trip for every user message. Each hop is a failure point, a latency penalty, and a protocol translation. + +**The Runner exists to bridge two things that shouldn't be separate.** The Runner's entire purpose is to translate between the DO's WebSocket protocol and OpenCode's HTTP/SSE protocol. It manages OpenCode's process lifecycle, consumes its event stream, tracks per-channel state, handles model failover, and reports back to the DO. It's 6000 lines of glue code. If the agent runtime talked directly to the sandbox, the Runner wouldn't need to exist. + +**Two sources of truth for session state.** The DO holds prompt queue state, channel mappings, and decision gates in SQLite. The Runner holds per-channel OpenCode session IDs, streaming state, tool call tracking, and model failover state in memory. D1 holds the canonical message history. When the Runner disconnects and reconnects, there's a complex resync protocol to reconcile these three state locations. This is fragile: the 60-second grace period, the session recreation logic, the "resync if busy, abort if stuck" flow all exist because state is scattered. + +**OpenCode is an opaque dependency.** We can't fix bugs in its agent loop or change how it handles tool calls, compaction, or context management. When it crashes, the Runner has to detect the crash, track crash counts, apply exponential backoff, and eventually declare a fatal state. We work around its limitations rather than fixing them: the memory pre-compaction flush at 70% context exists because we can't modify OpenCode's compaction behavior directly. + +**Platform lock-in is structural, not incidental.** The architecture doesn't just run on Cloudflare; it's shaped by Cloudflare. The DO's single-writer guarantee shapes the prompt queue design. Hibernatable WebSockets shape the connection model. DO alarms shape the timer system. SQLite in the DO shapes the hot storage pattern. To port to Kubernetes, you wouldn't just swap implementations; you'd have to redesign every subsystem that was shaped by a DO capability. + +**The prompt queue is session-wide, blocking cross-channel work.** A Slack conversation blocks web UI prompts. An orchestrator can't research in one thread while coding in another. This isn't a fundamental limitation; it's an artifact of the DO processing one prompt at a time because that's simpler in the single-writer model. + +### What Replaces It + +``` +Client + ↓ WebSocket / SSE +Platform Adapter (thin: ~200-400 lines) + ├── CF: Worker routes + SessionHostDO (just hosts engine) + └── K8s: Hono service + SessionPool (just hosts engine) + ↓ function call +Engine (portable, ~2000-3000 lines) + ├── Agent loop (pi-agent-core: prompt → LLM → tools → response) + ├── Thread management (per-thread queues, cross-visibility) + ├── Tool execution (built-in + custom ToolDef[]) + ├── Session state (DAG history, compaction) + ├── Model resolution & failover (pi-ai) + ├── Event emission + ↓ SandboxProvider interface +Sandbox (Modal / K8s Pod / Docker / Virtual) + └── filesystem + shell (no agent logic) +``` + +Total moving parts: 2 processes (adapter + sandbox), 1 transport protocol (HTTP to sandbox API), 0 custom message protocols, ~3000 lines of orchestration code. + +### Why It's Better + +**The engine is a library, not a distributed system.** Session state, prompt queuing, thread management, tool execution, and event emission all live in one process with one call stack. No WebSocket protocols, no message serialization, no reconnection logic, no state reconciliation. A prompt goes in, events come out. + +**One hop to execute a tool call.** Engine calls `sandbox.exec()` or `sandbox.readFile()`. The sandbox is just a filesystem and shell behind an interface. + +**Single source of truth for session state.** The engine holds all session state in memory during execution and persists through SessionStore. No split between DO SQLite, Runner memory, and D1. No resync protocol. No grace periods. If the engine process restarts, it rehydrates from SessionStore: one load, complete state. + +**Per-thread concurrency is natural.** Each thread has its own queue and executes independently. The engine manages concurrent threads within a session because it's just concurrent async operations in one process, not distributed coordination. + +**We own the agent loop.** Compaction behavior, tool call handling, context management, model failover: all modifiable. No working around an opaque dependency. + +**Platform is a configuration choice, not an architectural commitment.** The engine doesn't know about DOs, Workers, pods, or containers. It knows about SessionStore, SandboxProvider, EventBus, BlobStore, and CredentialStore. Porting to a new platform means implementing provider interfaces, not redesigning the session model. + +**The sandbox becomes simpler.** The sandbox runs only dev tools (code-server, VNC, TTYD) and a lightweight auth gateway. The agent brain is elsewhere. Sandbox boot time decreases. Sandbox crashes don't kill the agent; they just make tool calls fail temporarily until the sandbox recovers. + +**Testing becomes trivial.** The engine is a TypeScript library with injected interfaces. Test it with InMemorySessionStore, VirtualSandbox (just-bash), and InMemoryEventBus. No containers, no DOs, no network. Full integration tests run in milliseconds. + +## Architecture + +### Three Layers + +**1. Engine (`packages/engine/`)** — Portable TypeScript library, zero platform dependencies. Owns the agent loop, session/thread state, tool execution, prompt queuing, compaction, model failover, event emission, roles, and skills. + +**2. Provider interfaces** — Contracts defined by the engine, implemented per-platform. Five interfaces: SessionStore, SandboxProvider, EventBus, BlobStore, CredentialStore. + +**3. Platform adapters (`packages/adapter-cloudflare/`, `packages/adapter-k8s/`)** — Thin packages (~200-400 lines each) that implement the provider interfaces for a specific deployment target and host the engine process. + +``` +┌─────────────────────────────────────────────────────┐ +│ packages/engine/ │ +│ │ +│ ┌───────────┐ ┌──────────┐ ┌───────────────┐ │ +│ │ AgentLoop │ │ Session │ │ ToolRegistry │ │ +│ │(pi-agent- │ │ Manager │ │ │ │ +│ │ core) │ │ │ │ │ │ +│ └─────┬─────┘ └────┬─────┘ └───────┬───────┘ │ +│ │ │ │ │ +│ ┌─────▼─────────────▼───────────────▼───────────┐ │ +│ │ Provider Interfaces │ │ +│ │ SessionStore | SandboxProvider | EventBus │ │ +│ │ BlobStore | CredentialStore │ │ +│ └───────────────────────────────────────────────┘ │ +└─────────────────────────────────────────────────────┘ + │ │ +┌────────▼────────┐ ┌───────▼─────────┐ +│ adapter-cf/ │ │ adapter-k8s/ │ +│ D1, DO, R2, │ │ PG, Redis, S3, │ +│ Modal │ │ Modal/K8s Pods │ +└─────────────────┘ └─────────────────┘ +``` + +### Package Structure + +``` +packages/ + engine/ ← portable core (agent loop, tools, interfaces, schema) + src/ + schema/ ← Drizzle schema definitions (source of truth) + tools/ ← built-in tool implementations + session.ts ← session management + thread.ts ← thread lifecycle, cross-visibility + queue.ts ← per-thread prompt queue + agent-loop.ts ← pi-agent-core wrapper + compaction.ts ← context compression + events.ts ← typed event system + roles.ts ← role loading and resolution + skills.ts ← skill discovery and invocation + result.ts ← structured result extraction + types.ts ← all public types and interfaces + migrations/ + sqlite/ ← generated by drizzle-kit for D1 + postgresql/ ← generated by drizzle-kit for PG + api/ ← shared Hono route handlers, parameterized by store impls + adapter-cloudflare/ ← CF-specific wiring (DO host, D1/R2/DO providers) + adapter-k8s/ ← K8s-specific wiring (session pool, PG/Redis/S3 providers) +``` + +## V1 Completeness Contract + +V1 is complete when the engine can replace OpenCode, the Runner, and the SessionAgentDO orchestration path for normal interactive sessions on the Cloudflare adapter, while preserving the product-facing API/event behavior required by the web client and Slack reference transport. + +The V1 implementation must define and implement these contracts: + +| Contract | Owner | Required for V1 | +|---|---|---| +| Engine public API | `packages/engine` | Session creation/restoration, thread lookup, prompt submission, abort/pause/resume, decision resolution, event subscription | +| Session/thread/message model | `packages/engine` | DAG entries, thread metadata, queue state, compaction entries, decision gate entries, suspended turn checkpoints | +| Agent loop contract | `packages/engine` | pi-agent-core integration, model resolution, tool execution, failover, abort propagation, structured results | +| Tool contract | `packages/engine` + plugin packages | Built-in tools, plugin `ToolDef`s, command tools, action-policy wrapping, attachment handling | +| Decision gate contract | `packages/engine` + adapters | Approval, question, and credential-request gates, delivery refs, resolution, expiry, withdrawal, restart-safe resume | +| Provider contracts | adapters | SessionStore, SandboxProvider, EventBus, BlobStore, CredentialStore | +| Sandbox RPC contract | sandbox runtime + adapters | File operations, process execution, snapshots, tunnels, health, auth, request limits | +| Channel transport contract | SDK + adapters | Outbound messages, decision gate delivery/update, inbound action parsing, free-text gate resolution | +| API route contract | `packages/api` + adapters | Shared session/thread/prompt/history/decision/control routes | +| Client event contract | adapters | WebSocket/SSE event names and payloads for web UI consumption | +| Schema/migration contract | `packages/engine` | Drizzle schema, SQLite and PostgreSQL migrations, coexistence with current app tables during rollout | +| Observability contract | `packages/engine` + adapters | Audit events, analytics events, logs, status events, recoverable vs fatal errors | + +### V1 Exclusions + +The following are explicitly post-V1 unless needed to preserve an existing production workflow: + +- User-facing branch/replay controls beyond preserving DAG metadata. +- Kubernetes production deployment. The contract must exist, but Cloudflare is the V1 shipping adapter. +- Rewriting every plugin package by hand. V1 may use an `ActionSource` to `ToolDef` bridge. +- Replacing workflow execution internals. Workflows may continue to call the session API. +- Removing old tables immediately. V1 may run side-by-side with current tables while the migration completes. + +## Engine Public API + +The engine is a library. Platform adapters host it and expose HTTP/WebSocket entrypoints, but all session execution flows through this API. + +```typescript +interface Engine { + createSession(opts: CreateSessionOptions): Promise; + restoreSession(opts: RestoreSessionOptions): Promise; + getSession(sessionId: string): Promise; + deleteSession(sessionId: string): Promise; + onEvent(listener: (event: BusEvent) => void): Unsubscribe; +} + +interface RestoreSessionOptions { + sessionId: string; + // Same shape as CreateSessionOptions minus `id` — the caller re-supplies + // tools, sandbox, model, system prompt, etc. The engine does not maintain + // a registry of session-creation options across restarts; the host (DO, + // pod, CLI) is responsible for reconstructing them from its own config. + options: Omit; +} + +interface CreateSessionOptions { + id?: string; + userId: string; + orgId: string; + workspace: string; + purpose?: 'interactive' | 'orchestrator' | 'workflow' | 'child'; + parentSessionId?: string; + parentThreadId?: string; + sandbox: Sandbox | SandboxCreateOpts; + tools?: ToolDef[]; + commandTools?: CommandToolDef[]; + roles?: RoleSpec[]; + skills?: SkillSource[]; + model: string; + modelFailover?: string[]; + queueMode?: QueueMode; + metadata?: Record; +} + +interface SessionHandle { + id: string; + thread(key?: string): ThreadHandle; + prompt(content: PromptContent, opts?: PromptOptions): Promise; + resolveDecision(gateId: string, resolution: DecisionResolution): Promise; + withdrawDecision(gateId: string, reason: DecisionWithdrawReason): Promise; + abort(opts?: { threadId?: string }): Promise; + pause(opts?: { threadId?: string }): Promise; + resume(opts?: { threadId?: string }): Promise; + snapshot(): Promise; + destroy(): Promise; +} + +interface ThreadHandle { + id: string; + prompt(content: PromptContent, opts?: PromptOptions): Promise; + skill(name: string, opts?: SkillInvokeOptions): Promise; + shell(command: string, opts?: ExecOpts): Promise; + readThread(key: string, opts?: MessageQuery): Promise; + abort(): Promise; + pause(): Promise; + resume(): Promise; +} + +type QueueMode = 'followup' | 'steer' | 'collect'; + +type PromptContent = + | string + | { + text?: string; + attachments?: PromptAttachment[]; + }; + +interface PromptOptions { + author?: PromptAuthor; + channel?: ChannelTarget; + replyTarget?: ChannelTarget; + queueMode?: QueueMode; + model?: string; + role?: string; + resultSchema?: TSchema; + metadata?: Record; +} + +interface PromptAuthor { + id: string; + email?: string; + name?: string; + avatarUrl?: string; + externalId?: string; +} + +type PromptAttachment = + | { type: 'image'; url?: string; data?: Uint8Array; mimeType: string; name?: string } + | { type: 'file'; url?: string; data?: Uint8Array; mimeType: string; name: string } + | { type: 'audio'; url?: string; data?: Uint8Array; mimeType: string; name?: string }; + +interface PromptReceipt { + sessionId: string; + threadId: string; + queueItemId: string; + status: 'queued' | 'running' | 'blocked_on_decision_gate'; +} + +interface MessageQuery { + limit?: number; + cursor?: string; + afterEntryId?: string; + beforeEntryId?: string; + includeCompacted?: boolean; + includeSystemEntries?: boolean; +} + +interface ListOpts { + limit?: number; + cursor?: string; + status?: string; + createdAfter?: Date; + createdBefore?: Date; +} +``` + +The API is idempotent where identifiers are supplied by the caller. `createSession({ id })` must return the existing session if it has already been created with the same ID and compatible immutable fields. `resolveDecision()` must be safe to retry: resolving an already resolved gate with the same resolution is a no-op; resolving it with a different resolution returns a conflict error. + +`TSchema` refers to the TypeBox schema type used by pi-ai for structured parameters and results. API-layer adapters must serialize schemas as JSON Schema and preserve the original TypeScript type only inside package boundaries. + +## Data Model: Sessions, Threads, and Messages + +### Hierarchy + +``` +Session (sandbox, tools, roles, config) + ├── Thread 'web:default' ─── Messages (DAG) + ├── Thread 'slack:C123' ─── Messages (DAG) + ├── Thread 'task:research' ─── Messages (DAG) + │ + │ Threads can read from siblings (cross-thread visibility). + │ Threads execute concurrently (independent queues). + │ + └── Child Session (own or shared sandbox) + ├── Thread 'default' ─── Messages (DAG) + │ Can read from parent threads. + └── Parent can read child thread summaries. +``` + +### Session + +A session owns a sandbox instance, registered tools, roles, and configuration. It is the container for all agent work. + +- Created via the engine's API: `engine.createSession(opts)` +- Has a unique ID, a sandbox, a set of tools, optional roles and skills +- Can spawn child sessions (single-threaded or multi-threaded) +- Owns shared decision state used by its threads: pending decision gates, credentials, and child-session registry +- Session-wide controls: `abort()` aborts all threads, `pause()`/`resume()` freeze/unfreeze all thread queues + +### Thread + +A named conversation within a session. Each thread has its own message history (DAG-based), its own prompt queue, its own compaction state, and its own active model. Threads share the sandbox, tools, and roles from the parent session. + +- Created or retrieved via `session.thread(key)` +- `session.prompt()` is sugar for `session.thread('default').prompt()` +- Each channel target naturally maps to a thread key: `web:default`, `slack:C123`, `telegram:456`, `thread:` +- Threads can also be created explicitly for focused work: `task:research`, `review:pr-42` + +**Channel-aware thread identity:** A thread is the engine's concurrency and history boundary. Channel metadata is attached to prompts and messages, but channel transports do not define execution boundaries on their own. Multiple external channel targets may point at the same logical thread when the application intentionally converges them (for example, a Slack thread and the web UI both steering the same orchestrator thread). + +**Cross-thread visibility:** Threads can read messages from sibling threads via a built-in `thread_read` tool. The LLM can pull in context from another thread when it needs it, without paying the token cost of having it in context permanently. Cross-visibility also works across the session boundary: child session threads can read from parent threads, and parent threads can read child thread summaries. + +**Thread controls:** +- `thread.prompt(text, opts)` — submit a prompt +- `thread.abort()` — abort current prompt, clear this thread's queue +- `thread.pause()` / `thread.resume()` — freeze/unfreeze this thread's queue +- `thread.skill(name, opts)` — invoke a named skill +- `thread.shell(command)` — execute a shell command (recorded in history) +- `thread.readThread(key)` — read messages from a sibling thread + +### Messages + +Messages within a thread form a DAG (directed acyclic graph). Each message entry has a `parentId` pointing to its predecessor, enabling branching and replay. + +**Entry types:** +- `MessageEntry` — LLM or engine-authored messages (user, assistant, toolResult, system) with content, attachments, and source metadata +- `DecisionGateEntry` — a persisted decision point in the conversation DAG, including its status and any eventual resolution +- `CompactionEntry` — summarized context checkpoint inserted by the compaction system +- `BranchSummaryEntry` — summary of a branched conversation + +```typescript +interface BaseEntry { + id: string; + sessionId: string; + threadId: string; + parentId: string | null; + createdAt: number; + metadata?: Record; +} + +interface MessageEntry extends BaseEntry { + type: 'message'; + role: 'user' | 'assistant' | 'tool' | 'system'; + content: string; + parts?: MessagePart[]; + author?: PromptAuthor; + channel?: ChannelTarget; + model?: string; +} + +type MessagePart = + | { type: 'text'; text: string } + | { type: 'thinking'; text: string } + | { type: 'tool_call'; callId: string; toolName: string; status: 'running' | 'completed' | 'error'; args?: unknown; result?: unknown; error?: string } + | { type: 'attachment'; attachment: ToolAttachment } + | { type: 'error'; message: string; code?: string }; + +interface CompactionEntry extends BaseEntry { + type: 'compaction'; + summary: string; + coveredEntryIds: string[]; + tokenCountBefore: number; + tokenCountAfter: number; + fileContext?: { + read: string[]; + modified: string[]; + }; +} + +interface BranchSummaryEntry extends BaseEntry { + type: 'branch_summary'; + branchRootId: string; + branchLeafId: string; + summary: string; +} +``` + +The active conversation path is reconstructed by following `parentId` pointers from the leaf back to the root. Compaction inserts a summary without rewriting history. + +**LLM-faithful entry persistence (rehydration contract):** the engine must persist enough information in `MessageEntry.parts` to reconstruct LLM-compatible content blocks on restore. Specifically: + +- An assistant entry that issued tool calls MUST persist one `MessagePart` of type `tool_call` per call, with `callId`, `toolName`, and `args`. Without this, a restored transcript would show the assistant's text but lose the tool calls, producing a malformed `[user, assistant(text), toolResult]` sequence that LLM providers reject. +- A tool-result entry (role `tool`) MUST persist `callId` so the LLM provider can match it to the assistant's tool call. +- Thinking content, if recorded at all, persists with provider-specific signatures intact when available, so cross-provider handoff and replay produce valid context. + +`MessageEntry.content` is the human-readable text rendering; `MessageEntry.parts` is the structured source of truth used during rehydration. + +**Suspension history rules:** Decision-gated turns are represented in the DAG by a first-class `DecisionGateEntry`, not by synthetic system messages. The entry is created when the gate is opened and then updated in place as it moves through `pending`, `resolved`, `expired`, or `withdrawn` states. This keeps the history model explicit and replayable: gates are decision artifacts, not conversation utterances. + +**V1 branching stance:** The storage model remains DAG-based so future replay and alternate branches are possible without schema redesign, but V1 does not require exposing full user-facing branch/replay controls in the API. V1 must preserve enough metadata for later branching support without forcing branching UX to ship in the first implementation batch. + +## Engine Internals + +### Agent Loop + +The engine uses `@mariozechner/pi-agent-core` for the inner agent loop and `@mariozechner/pi-ai` for the LLM provider layer. The engine wraps these with session/thread management, tool context injection, and event routing. + +**Per-thread agent instance:** Each thread gets its own `Agent` instance (from pi-agent-core). The agent manages the LLM streaming, parallel tool execution, and turn lifecycle. The engine subscribes to the agent's events and translates them to `EngineEvent` emissions. + +**Loop flow:** + +``` +prompt received on thread + → compose context (system prompt + thread history + role instructions) + → build tool list (built-in + custom, with ToolContext injection) + → create/update Agent instance with context and tools + → agent runs: call LLM (streaming via pi-ai) + → for each tool call in response: + → execute tool via ToolDef.execute(args, ctx) + → if tool requests a decision gate: + → persist DecisionGate + SuspendedTurnState + → append DecisionGateEntry(status='pending') to the DAG + → emit decision_gate event + → stop only this thread's active turn + → when a decision gate is resolved: + → update the existing DecisionGateEntry with resolution metadata and status='resolved' + → reconstruct the suspended turn from persisted state + → re-run the suspended tool/turn from the checkpoint + → when a decision gate expires or is withdrawn: + → update the existing DecisionGateEntry with status='expired' or status='withdrawn' + → fail or cancel the suspended turn + → if tool returns attachments, handle per type: + → image attachments → route to LLM as vision content + → text attachments → include inline in tool result + → file attachments → store via BlobStore, reference in history + → append tool result to thread history + → if LLM wants to continue (more tool calls): loop + → if LLM emits end_turn: done + → check compaction threshold, compact if needed + → persist thread state via SessionStore + → emit events throughout +``` + +### LLM Provider Layer + +The engine adopts `@mariozechner/pi-ai` for model abstraction. pi-ai provides a unified streaming interface across 20+ providers (Anthropic, OpenAI, Google, Mistral, Bedrock, etc.), typed streaming events, tool type definitions, vision support detection, context serialization, and cross-provider handoffs. + +The engine adopts `@mariozechner/pi-agent-core` for the inner agent loop. pi-agent-core provides the `Agent` class that handles the LLM streaming, parallel tool execution, abort handling, and event emission cycle. + +**What pi-ai gives us:** +- Model discovery and provider configuration (`getModel('anthropic', 'claude-sonnet-4-6')`) +- Streaming with typed events (`text_delta`, `toolcall_start/delta/end`, `thinking_start/delta/end`) +- Token and cost tracking per call +- Context serialization for persistence +- Cross-provider context handoffs (enables model failover with automatic thinking-to-text conversion) +- Faux provider for deterministic testing (`registerFauxProvider()`) + +**What pi-agent-core gives us:** +- The `Agent` class: prompt → LLM → tool calls → execute → feed results → loop until end_turn +- Parallel tool execution (`toolExecution: 'parallel'`) +- Typed event subscription (`agent_start`, `message_update`, `tool_execution_start/end`, `turn_end`) +- Abort signal propagation +- State management (messages, model, tools) + +**What the engine adds on top:** +- Sessions and threads (pi-agent-core has no concept of persistence or multi-conversation) +- Per-thread prompt queue with modes +- Cross-thread visibility +- Decision gates and resumable user-interaction points +- Compaction (using pi-ai's token counts to decide when, pi-ai's streaming to generate summaries) +- Tool context injection (credentials, sandbox, user identity) +- Event routing from pi-agent-core events to EngineEvent emissions +- Model failover (catch retriable errors, hand off context to next model via pi-ai) +- Structured result extraction with schema validation + +**Model resolution:** Uses `provider/model` string convention (same as pi-ai and OpenRouter). Provider instances are registered at startup by the platform adapter. Model failover is configured per-session as an ordered list; on retriable errors, the engine advances to the next model and hands off the context using pi-ai's cross-provider serialization. + +#### Model Registry Contract + +Adapters register model providers before restoring or creating sessions. + +```typescript +interface ModelRegistry { + registerProvider(provider: ModelProviderConfig): void; + get(model: string): Promise; + list(opts?: { userId?: string; orgId?: string }): Promise; +} + +interface ModelProviderConfig { + id: string; + displayName: string; + apiKey?: string; + baseUrl?: string; + models?: ModelDescriptor[]; +} + +interface ModelDescriptor { + id: string; // provider/model + providerId: string; + modelId: string; + displayName?: string; + contextWindow?: number; + outputLimit?: number; + input: Array<'text' | 'image' | 'audio'>; + output: Array<'text' | 'tool_call'>; +} + +interface ModelHandle { + descriptor: ModelDescriptor; + provider: unknown; // pi-ai provider instance, hidden behind engine package boundaries +} +``` + +Model selection order is prompt override, role override, thread model, session model, then platform default. Failover never crosses into a model the user or org is not authorized to use. + +### Tool System + +Three categories of tools, merged at prompt time: + +**Built-in tools** (provided by the engine, always available): +- `read` — read file contents via SandboxProvider +- `write` — create/overwrite files via SandboxProvider +- `edit` — exact text replacement via SandboxProvider +- `bash` — shell execution via SandboxProvider +- `grep` — pattern search via SandboxProvider +- `glob` — file pattern matching via SandboxProvider +- `thread_read` — read messages from a sibling, parent, or child thread +- `task` — spawn a child session for delegated work (depth-limited) + +**Plugin tools** (`ToolDef[]`, registered at session creation): +- Custom tools from plugin packages (GitHub, Slack, Linear, memory, browser, etc.) +- Each is a `{ name, description, parameters, execute }` object +- Registered per-session or per-thread (thread-level overrides session-level on name conflict) + +**Command tools** (privileged CLI wrappers): +- Shell commands with injected environment variables +- Secrets are injected at the host level, never visible to the LLM +- Scoped per-prompt or per-session + +```typescript +interface CommandToolDef { + name: string; + description: string; + command: string; + args?: string[]; + env?: Record; + cwd?: string; + riskLevel?: 'low' | 'medium' | 'high' | 'critical'; + requiresApproval?: boolean; + timeoutMs?: number; +} +``` + +Command tools execute through `Sandbox.exec`. The engine injects configured environment variables into the process environment and never serializes secret values into message history, tool arguments visible to the model, or events. + +#### ToolDef Interface + +```typescript +interface ToolDef { + name: string; + description: string; + parameters: TSchema; // TypeBox schema (pi-ai native) + riskLevel?: 'low' | 'medium' | 'high' | 'critical'; + requiresApproval?: boolean | ((args: Record, ctx: ToolContext) => Promise | boolean); + execute: (args: Record, ctx: ToolContext) => Promise; +} +``` + +Tool names are globally unique within a session after registration. Built-in tools use short names (`read`, `bash`); plugin tools use service-qualified names (`github.create_pr`, `linear.create_issue`). If two tools register the same name at the same scope, session creation fails unless a thread-level override intentionally replaces a session-level tool. + +#### ToolContext + +Every tool execution receives a context object from the engine: + +```typescript +interface ToolContext { + // Identity + userId: string; + orgId: string; + sessionId: string; + threadId: string; + sessionPurpose?: string; + actor?: { + id: string; + name?: string; + email?: string; + }; + + // Prompt/message routing + channelType?: string; + channelId?: string; + decisionGateId?: string; + replyChannelType?: string; + replyChannelId?: string; + + // Repo / workspace context + cwd?: string; + repo?: { + url?: string; + branch?: string; + ref?: string; + provider?: string; + }; + + // Credentials + credentials: CredentialProvider; + + // Sandbox (for tools that need file/shell access) + sandbox: Sandbox; + + // Structured runtime interactions + requestDecision: (req: DecisionGateRequest) => Promise; + emitArtifact?: (artifact: ToolArtifact) => Promise; + /** + * Set by the engine ONLY on a replayed tool execution after restart. + * When `gateId` matches the deterministic ID derived from this call's + * `req.resumeKey`, the engine returns the stored `resolution` immediately + * instead of opening a new gate. Tools never set this themselves. + */ + suspendedDecision?: SuspendedDecisionContext; + + // Abort + signal: AbortSignal; +} + +interface CredentialProvider { + get(service: string): Promise; + request(service: string, reason: string): Promise; +} + +interface Credential { + accessToken: string; + refreshToken?: string; + expiresAt?: number; + scopes?: string[]; + metadata?: Record; +} + +type ToolArtifact = + | { type: 'file'; path?: string; blobKey?: string; title?: string } + | { type: 'link'; url: string; title: string } + | { type: 'diff'; path?: string; content: string }; + +interface SuspendedDecisionContext { + gateId: string; + resolution?: DecisionResolution; +} +``` + +When a tool calls `credentials.request()` for a credential that doesn't exist, the engine pauses tool execution and emits a `decision_gate` event to the user. Execution resumes when the credential is provided. If the user does not respond within a configurable timeout (default 10 minutes), the request fails and the tool receives a structured credential error. Same pattern as tool approvals. + +Approval-gated tools follow the same suspension model. A tool can return or throw a structured `approval_required` signal, which the engine converts into a `DecisionGate`, persists, emits, and resumes on resolution. + +**Restart-safe tool suspension contract:** The engine does not rely on preserving an in-memory JavaScript continuation across restarts. Tools that call `requestDecision(...)` must therefore be re-entrant up to their decision points. On first execution, `requestDecision(...)` persists the gate and suspends the turn. On resumed execution, the engine re-runs the tool from the start with `suspendedDecision` populated for the matching gate ID, and the same `requestDecision(...)` call returns the stored resolution instead of creating a new gate. + +**What "re-entrant up to the decision point" means in practice:** any work the tool does *before* `requestDecision(...)` will run twice — once on the original execution (lost when the engine restarts), once on replay. Side effects in that prefix must be idempotent or read-only. Work *after* `requestDecision(...)` returns runs once on replay only. Tools that need to do non-idempotent work before a gate should split into two tools (one to do the work and persist a result, another to gate-and-act on it) or move the work to after the gate. + +**How the engine populates `ctx.suspendedDecision`:** on `restoreSession`, for every thread whose persisted queue status is `blocked_on_decision_gate`, the engine loads the corresponding `DecisionGate` and `SuspendedTurnState`. If the gate is still `pending`, the engine re-arms its in-memory wait so a future `resolveDecision(...)` call delivers the resolution. If the gate is already `resolved` (the user resolved it while the engine was down) or becomes resolved later, the engine invokes the persisted tool by name with the persisted args, sets `ctx.suspendedDecision = { gateId, resolution }` for that one execution, and feeds the returned `ToolResult` back into the agent loop as if the original turn had completed — then calls the agent's continuation to produce the next assistant turn. + +**Replay event guarantees:** the replayed tool execution does not need to emit the same per-call `tool_start` / `tool_end` event pair as the original turn (the original pair was already emitted before the engine went down). The engine MUST emit the post-replay `text_delta` / `message_end` / `turn_end` events for the continuation turn so that connected clients see the agent finish the work. Adapters re-deliver pending gates on client (re)connection through the `init` event payload. + +#### Plugin Action Bridge + +V1 keeps using existing plugin action packages through an adapter, but the bridge does NOT register one LLM-visible tool per action. With dozens of plugins each exporting dozens of actions, direct registration would (a) blow past LLM tool-catalog size budgets, (b) collide with provider tool-name regexes (Anthropic requires `^[a-zA-Z0-9_-]{1,128}$`, so dotted ids like `github.create_issue` are rejected), and (c) force every session to pay the prompt cost of every action even when only a few are relevant. + +Instead, plugin actions are surfaced through two engine-built-in indirection tools — `list_tools` and `call_tool` — that expose a searchable catalog the agent consults on demand. + +```typescript +interface ActionSource { + listActions(ctx?: { credentials?: Record }): ActionDefinition[] | Promise; + execute(actionId: string, params: unknown, ctx: ActionContext): Promise; +} + +interface ActionDefinition { + id: string; // fully-qualified, e.g. "github.create_issue" + name: string; + description: string; + riskLevel: RiskLevel; + params?: unknown; // Zod schema from current SDK packages + inputSchema?: Record; +} + +interface ActionContext { + credentials: Record; + userId: string; + orgId?: string; + callerIdentity?: { name: string; avatar?: string }; + analytics?: unknown; + attribution?: { name: string; email: string }; + guardConfig?: Record; +} + +interface ActionResult { + success: boolean; + data?: unknown; + error?: string; + images?: Array<{ data: string; mimeType: string; description: string }>; +} + +interface ActionSourceConfig { + service: string; // routing key + default credential service + actions: ActionSource; + credentialService?: string; // override service for credential lookup + defaultApprovalMode?: 'allow' | 'require_approval' | 'deny'; +} + +interface ActionBridgeOptions { + sources: ActionSourceConfig[]; +} + +/** + * Returns exactly two ToolDefs: `list_tools` and `call_tool`. Internally the + * bridge holds a catalog assembled from every ActionSource passed in. + */ +function actionBridgeTools(opts: ActionBridgeOptions): Promise; +``` + +`list_tools` accepts: + +- `service?: string` — filter by service name. +- `query?: string` — match against action name, id, and description (case-insensitive substring). +- `limit?: number` — cap results (default 50, max 200). + +It returns a structured payload: `{ service, id, name, description, riskLevel, params }` per action, plus per-service auth/availability warnings when credentials are missing or expired. + +`call_tool` accepts: + +- `tool_id: string` — the fully-qualified action id (e.g. `github.create_issue`). +- `params: object` — the action arguments, validated against the action's parameter schema before dispatch. +- `summary: string` — one-line human-readable description used in approval gates and audit logs. + +Bridge behavior: + +- Action ids stay unchanged inside the catalog and as `tool_id` arguments. Provider tool-name regexes never apply because action ids ride as string args, not tool names. +- Zod parameters are converted to TypeBox/JSON Schema at registration time and exposed verbatim through `list_tools`. +- `call_tool` validates `params` against the action's schema. Validation errors return a structured tool error, not an exception. +- `riskLevel` is reported in `list_tools` and consulted in `call_tool` to decide whether to open a `DecisionGate` (`high`/`critical` default to `require_approval` unless the per-source `defaultApprovalMode` overrides). The action's `summary` arg is the gate body. +- Credentials are resolved through `CredentialProvider` per call, scoped to the action's `credentialService`. Missing credentials surface as a structured "auth required" tool error and as a warning in subsequent `list_tools` responses. +- Action analytics events are forwarded to the engine observability sink. +- Action images are converted to `ToolAttachment` objects and handled by the engine attachment pipeline. + +The bridge is a migration layer, not a permanent engine dependency. New plugins may either (a) keep emitting `ActionSource`s and let the bridge expose them, or (b) export `ToolDef[]` directly when they want to be registered as first-class engine tools (e.g. coding-loop primitives where the per-call indirection is unwanted overhead). Engine adapters compose both paths in the same session. + +#### ToolResult + +```typescript +type ToolResult = { + text: string; + attachments?: ToolAttachment[]; +}; + +type ToolAttachment = + | { type: 'image'; data: Uint8Array; mimeType: string; name?: string } + | { type: 'file'; data: Uint8Array; mimeType: string; name: string } + | { type: 'text'; content: string; name?: string; language?: string }; +``` + +**Attachment handling by the engine:** +- `image` attachments are routed to the LLM as vision content (if the model supports it via `model.input.includes('image')`). +- `file` attachments are stored via BlobStore and referenced in the message history. Available to the LLM if requested but not injected into context automatically. +- `text` attachments are included inline in the tool result message. The `language` field enables syntax-aware formatting. + +### Compaction + +Token-aware context compression with two complementary techniques. When a thread approaches the model's context window, the engine **prunes** stale tool outputs cheaply (no LLM) and, if more space is needed, **compacts** older messages into a structured summary (one LLM call). The DAG is preserved verbatim — pruning marks tool-output strings as elided, compaction inserts a `CompactionEntry`. Both transformations apply only when assembling the LLM-visible context; the engine's history record never loses anything. + +This design is informed by OpenCode's compaction module (which itself iterates on prior tools like Aider's repo-summarization). Where this spec and that implementation differ, this spec is authoritative for Valet V1. + +#### Triggers + +- **Proactive (auto)** — after each turn, if `tokens.total >= usable(model, cfg)` where + ``` + usable = contextWindow − reserved + reserved = cfg.reserveTokens ?? min(20_000, model.maxOutputTokens) + ``` + the engine queues a compaction pass to run before the next user turn would otherwise execute. Token usage comes from pi-ai's per-call `Usage`; we do not estimate independently in this path. +- **Reactive (overflow)** — if a turn's assistant message returns `stopReason === 'error'` and pi-ai's `isContextOverflow(message)` matches the error, the engine compacts and retries the same turn. Reactive compaction strips media attachments from history before summarizing (some overflow is media-bytes, not token-count, so dropping images can be enough on its own). + +#### Tail preservation + +Compaction never touches the most recent turns. A "turn" is the segment from one user message up to (but not including) the next user message, including the assistant's tool calls and tool results. + +- Default keep: the last `cfg.tailTurns ?? 2` turns. +- Tail token budget: `clamp(usable * 0.25, cfg.minPreserveRecentTokens ?? 2_000, cfg.maxPreserveRecentTokens ?? 8_000)`. +- If the last `tailTurns` turns exceed the budget, the engine walks them oldest → newest and drops whole turns from the head of that window until the rest fits. If a single turn alone exceeds the budget, the engine splits it at the first message boundary that fits, summarizing the prefix into the compaction and keeping the suffix in the tail. + +#### Pruning (cheap path, no LLM) + +Walk messages newest → oldest. Track cumulative tool-output token estimate. Once the cumulative count exceeds `cfg.pruneProtectTokens ?? 40_000`, mark every older `tool_call`-result text as `elided`. Skip protected tools (the engine ships with `skill` and `thread_read` protected by default; per-tool opt-in via `ToolDef.protectedFromPruning`). + +The DAG entry is updated in place via `SessionStore.updateEntry` — `MessagePart` of type `tool_call` keeps `callId`, `toolName`, `args`, and `status`, but its `result` field is replaced with a placeholder `{ elided: true, reason: 'pruned' }` and `elided: true` is set on the part. LLM-context assembly skips elided results. The persistence is atomic per entry, not per part: the entire `MessageEntry` row is rewritten with the same id. Pruning only commits if it'd save at least `cfg.pruneMinimumTokens ?? 20_000` tokens; otherwise it's a no-op. + +Pruning runs before compaction on the proactive path. Often pruning alone is enough. + +#### Compaction (LLM path) + +When pruning isn't enough (or after `cfg.pruneMinimumTokens` worth of tool output has already been elided), the engine summarizes the messages before the tail. + +1. Compute the cut point per the tail-preservation rules above. +2. Assemble the head: the messages before the cut, with tool outputs truncated to `cfg.toolOutputMaxChars ?? 2_000` chars and image content stripped. +3. If the thread already has a `CompactionEntry`, load its `summary` as `previousSummary`. The new summarization is iterative — the prompt asks the summarizer to *update* the prior summary with new facts rather than write a fresh one. +4. Call a summarizer model (`cfg.summarizerModel ?? sessionModel`; typically a smaller cheaper model like Haiku) with a structured-markdown prompt: + ``` + ## Goal · ## Constraints & Preferences + ## Progress (Done / In Progress / Blocked) · ## Key Decisions + ## Next Steps · ## Critical Context · ## Relevant Files + ``` + This template is required, not advisory. The summary text is the source of truth for the LLM's view of pre-cut history; using a structured form prevents the summary from drifting into prose that crowds out specific facts (paths, error strings, identifiers). +5. Persist a `CompactionEntry` in the DAG with: + - `summary`: the markdown produced by step 4. + - `coveredEntryIds`: every entry id from the DAG head that this summary represents. + - `tokenCountBefore` / `tokenCountAfter`: token counts of the head before and the summary after, for observability. + - `fileContext`: extracted paths from `read`/`write`/`edit` tool calls in the head, classified `read` vs `modified` (helps the agent re-orient on resume). +6. Emit `compaction_start` then `compaction_end` events with the entry id. + +The `CompactionEntry` is positioned at the cut point in the DAG; `parentId` links it to the last covered entry. Subsequent `MessageEntry`s parent to the `CompactionEntry`. Branching/replay still works: walking from leaf via `parentId` produces a valid history, with the summary standing in for everything older. + +#### Applying compaction to LLM context + +The engine's `convertToLlm` pipeline (the function fed to pi-agent-core's `Agent` to translate persisted DAG entries into LLM messages) does the rewrite at request time: + +1. Load DAG entries for the thread. +2. Find the most recent `CompactionEntry`. If none, pass entries through unchanged. +3. Drop every entry whose id is in the active compaction's `coveredEntryIds`. +4. Replace them with a single user message containing the summary text, framed as `{summary}`. +5. Apply pruning's elision: any kept entry's tool-call parts whose `result.elided === true` get a placeholder `[output elided to save context]` in the LLM-visible content. +6. Yield the resulting `Message[]` to the agent loop. + +This is also the rehydration path on `restoreSession` — there is no separate "rebuild context after compaction" code path. + +#### Auto-continue after compaction + +After a successful proactive compaction (i.e., one we ran on our own initiative, not in response to the user's prompt), if the thread is mid-task the engine injects a synthetic user message before yielding back to the next queue item: + +> "Continue if you have next steps, or stop and ask for clarification if you are unsure how to proceed." + +The synthetic message is tagged with `metadata: { compaction_continue: true }` so client UIs can render it differently or hide it. Reactive (overflow) compactions don't auto-continue — they just retry the original turn that triggered the overflow. + +#### Configuration + +| Key | Default | Notes | +|---|---|---| +| `cfg.compactionEnabled` | `true` | per-thread switch | +| `cfg.reserveTokens` | `min(20_000, maxOutput)` | head-room subtracted from contextWindow | +| `cfg.tailTurns` | `2` | last N turns never touched | +| `cfg.minPreserveRecentTokens` | `2_000` | floor on tail token budget | +| `cfg.maxPreserveRecentTokens` | `8_000` | ceiling on tail token budget | +| `cfg.pruneProtectTokens` | `40_000` | recent tool-output bytes never pruned | +| `cfg.pruneMinimumTokens` | `20_000` | only commit prune if it saves ≥ this much | +| `cfg.toolOutputMaxChars` | `2_000` | when feeding head to summarizer | +| `cfg.summarizerModel` | `sessionModel` | dedicated summarizer is cheaper | +| `cfg.protectedTools` | `['skill', 'thread_read']` | per-tool opt-out from pruning; `ToolDef.protectedFromPruning` adds to this set | +| `cfg.autoContinue` | `true` | inject the auto-continue prompt after proactive compaction | + +### Per-Thread Prompt Queue + +Each thread owns its own prompt queue. Threads execute independently and concurrently within a session. + +**Concurrency model:** +- Each thread processes one prompt at a time (serialized within a thread) +- Multiple threads can be active simultaneously (parallel across threads) +- Sandbox access is shared: concurrent file ops and shell commands from different threads hit the same filesystem +- Tool execution is thread-safe by contract: tool authors handle their own concurrency if needed + +**Queue modes** (per-thread, switchable at runtime): + +- **Followup** (default) — prompts queue in FIFO order. When the current prompt completes, the next one starts. If the thread is idle, the prompt executes immediately. +- **Steer** — new prompt aborts the in-flight prompt and starts immediately. Previous prompt's partial work remains in the thread history. +- **Collect** — prompts buffer for a configurable window (default 5 seconds). When the window closes, all buffered prompts are concatenated into a single prompt and dispatched. If the thread is busy, the collected prompt enters the FIFO queue as normal. + +**Prompt metadata:** Each prompt carries `threadId`, `channelType`, `channelId`, `authorId`, optional attachments, and optional model override. + +**Routing semantics:** Queueing is keyed by thread, not by transport. `channelType` / `channelId` are routing metadata used for attribution, reply delivery, and decision gate resolution. They do not create extra isolation beyond the owning thread. + +**Steer semantics:** `steer` aborts only the current turn on the targeted thread. It must not affect other active threads in the session. Partial work already emitted by the aborted turn remains in history. + +**Collect semantics:** `collect` buffers by thread. Adapters may additionally preserve origin-channel metadata for each buffered prompt so the merged prompt can still attribute its constituent messages correctly. + +**Pending decision semantics:** When a thread is blocked on a pending decision gate, it is considered busy but interruptible. Behavior by mode: + +- `followup` — new prompts queue behind the blocked turn. +- `collect` — new prompts continue buffering and later queue behind the blocked turn. +- `steer` — new prompt cancels the blocked turn and expires or withdraws the outstanding decision gate before starting immediately. + +The engine must never allow an old gate resolution to resume a turn that was already superseded by `steer`. + +**Persisted runtime state:** A thread with a pending decision gate remains the active processing item in queue state, but with a distinct suspended status. V1 queue persistence must distinguish at least: + +- `queued` +- `running` +- `blocked_on_decision_gate` +- `paused` + +When a thread enters `blocked_on_decision_gate`, the engine persists a `SuspendedTurnState` checkpoint containing enough information to safely resume after restart: + +- session ID / thread ID / active queue item ID +- current model +- active leaf message ID +- pending gate ID (derived from `gate:${sessionId}:${threadId}:${queueItemId}:${resumeKey}`) +- pending tool call ID, tool name, and original tool args (used to invoke the tool by name during replay) +- the `resumeKey` the tool supplied (used to recompute the gate ID on replay and confirm a match) + +On restore, the engine reloads the blocked thread, reloads the decision gate, and waits for either resolution, expiry, or cancellation. Once resolved, the engine reconstructs the turn from the checkpoint and re-drives execution. + +**Persistence:** Queue state is persisted via SessionStore so it survives process restarts. On engine startup, pending queue entries are restored and dispatched. + +**Controls:** +- `thread.abort()` — abort current prompt on this thread, clear this thread's queue +- `thread.pause()` / `thread.resume()` — freeze/unfreeze this thread's queue +- `session.abort()` — abort all threads +- `session.pause()` / `session.resume()` — freeze/unfreeze all thread queues +- Session-wide idle = all threads idle + +### Decision Gates + +Decision gates are first-class engine primitives for "pause here and wait for an external human decision". Engine, adapter, SDK, API, client, and channel contracts use `DecisionGate` naming and payloads consistently. + +V1 uses one unified mechanism for: + +- tool approvals +- agent questions +- credential acquisition / re-authorization + +This replaces ad hoc transport- or adapter-specific waiting behavior. A gate is persisted, emits events, may be delivered to external channels by the adapter, and resumes or fails the waiting operation when resolved, expired, or withdrawn. + +**Gate model:** + +```typescript +interface DecisionGate { + id: string; + sessionId: string; + threadId: string; + type: 'approval' | 'question' | 'credential_request'; + title: string; + body?: string; + actions: DecisionAction[]; + expiresAt?: number; + status: 'pending' | 'resolved' | 'expired' | 'withdrawn'; + context?: Record; + origin?: { + channelType?: string; + channelId?: string; + messageId?: string; + }; + refs?: Array<{ + channelType: string; + ref: DecisionGateRef; + }>; +} + +interface DecisionAction { + id: string; + label: string; + style?: 'primary' | 'danger'; +} + +interface DecisionResolution { + actionId?: string; + value?: string; + resolvedBy: string; + resolvedAt: number; + source?: { + channelType?: string; + channelId?: string; + messageId?: string; + }; +} + +type DecisionWithdrawReason = 'steer' | 'abort' | 'cancel'; + +interface DecisionGateRef { + messageId: string; + channelId: string; + threadId?: string; + [key: string]: unknown; +} + +interface DecisionGateEntry { + type: 'decision_gate'; + id: string; + parentId: string | null; + timestamp: string; + gate: DecisionGate; + resolvedAt?: string; + resolution?: DecisionResolution; + withdrawnReason?: DecisionWithdrawReason; +} +``` + +**Gate types:** + +- `approval`: asks whether a tool or command may proceed. Required actions are `approve` and `deny` unless a custom action list is supplied. +- `question`: asks the user for an answer. May include option actions or accept free text when `actions` is empty. +- `credential_request`: asks the user to connect or re-authorize a service. Required context fields are `service`, `reason`, and optional `scopes`. + +**Gate delivery contract:** + +1. Engine creates and persists the gate with `status = 'pending'`. +2. Engine appends or updates the corresponding `DecisionGateEntry` in the thread DAG. +3. Engine publishes `decision_gate`. +4. Adapter delivers the gate to web clients and any matching channel targets. +5. Each channel delivery returns a `DecisionGateRef`; the adapter persists refs back through `SessionStore.saveDecisionGateRef`. +6. The first valid resolution wins. +7. Adapter calls `session.resolveDecision(gateId, resolution)`. +8. Engine updates gate status, updates the DAG entry, clears suspended state, and resumes or fails the blocked turn. +9. Adapter updates delivered channel messages via stored refs. + +The engine must treat missing channel delivery as non-fatal. A gate that cannot be delivered externally remains visible through the web/client event stream and API. + +**Execution semantics:** + +- A tool or agent loop may create a gate and suspend the waiting operation. +- Suspension is scoped to the waiting thread/turn, not the whole session. +- Other threads in the same session may continue running while one thread is blocked on a gate. +- Resolution resumes the suspended operation with typed input. +- Expiry fails the suspended operation with a structured error. +- Withdrawal cancels the suspended operation without permitting later resolution to resume it. + +The `DecisionGateEntry.id` should be the canonical DAG entry ID for the gate, while `DecisionGate.id` is the stable runtime identity used by transports, queue state, and suspended-turn checkpoints. In V1 these may be the same value for simplicity. + +**Deterministic gate identity:** A gate created from a tool execution must use a stable ID for that suspension point within the active turn. This is what allows the engine to re-run the tool after restart and have `requestDecision(...)` match the existing persisted gate instead of creating a duplicate. + +The V1 derivation is: + +``` +gateId = `gate:${sessionId}:${threadId}:${queueItemId}:${resumeKey}` +``` + +`resumeKey` is **required** on `DecisionGateRequest` (not optional). Tool authors choose a key that uniquely identifies the suspension point given the tool's inputs — typically a function of the tool's args (e.g. `"github.create_pr:owner/repo:head→base"`). Two `requestDecision(...)` calls in the same active queue item with the same `resumeKey` open the same gate. Two calls with different `resumeKey`s open different gates. A replayed tool execution that reaches the same `requestDecision(...)` call site with the same args produces the same `resumeKey` and therefore the same `gateId`, which is how the short-circuit works. + +```typescript +interface DecisionGateRequest { + type: 'approval' | 'question' | 'credential_request'; + title: string; + body?: string; + actions?: DecisionAction[]; + expiresAt?: number; + context?: Record; + origin?: { channelType?: string; channelId?: string; messageId?: string }; + resumeKey: string; // REQUIRED for restart-safe gates +} +``` + +**Resolution paths:** + +- explicit action selection (`approve`, `deny`, option buttons) +- free-text reply from the web UI +- free-text reply from an external channel thread when the adapter matches the stored origin target + +The engine owns the gate lifecycle and persistence; adapters own delivery details for Slack, Telegram, web, etc. + +**Conflict handling:** + +- Resolving a non-pending gate returns `decision_gate_conflict` unless the supplied resolution exactly matches the stored resolution. +- Expiry and withdrawal are terminal states. +- A `steer` prompt on the same thread withdraws pending gates created by the superseded turn with reason `steer`. +- `thread.abort()` withdraws pending gates on that thread with reason `abort`. +- `session.abort()` withdraws all pending gates in the session with reason `abort`. +- Resolutions received after withdrawal or expiry must be acknowledged to the transport but must not resume execution. + +### Roles and Skills + +**Roles** — Markdown files with optional YAML frontmatter (`name`, `description`, `model`). Applied as system prompt overlays. Precedence: prompt-level > thread-level > session-level. If a role declares a `model`, it overrides the session's default model for that prompt. + +**Skills** — Markdown files discovered from the sandbox filesystem or a configured directory. Invoked explicitly via `thread.skill(name, { args })`. The skill's instructions become a focused prompt with the given arguments. Skill files use frontmatter (`name`, `description`) and support `{{variable}}` template syntax for argument injection. + +Both are loaded at runtime, not baked into the engine build. + +```typescript +interface RoleSpec { + name: string; + description?: string; + model?: string; + content: string; + source?: 'session' | 'thread' | 'prompt' | 'plugin' | 'sandbox'; +} + +interface SkillSource { + name: string; + description?: string; + content: string; + argsSchema?: TSchema; + source?: 'plugin' | 'sandbox' | 'repo' | 'user'; +} + +interface SkillInvokeOptions { + args?: Record; + model?: string; + author?: PromptAuthor; + channel?: ChannelTarget; + resultSchema?: TSchema; +} +``` + +Role and skill loading errors are non-fatal at session creation only when the source is optional. Prompt-level role or skill resolution errors fail the prompt before model invocation. + +### Event System + +The engine emits typed events through a callback. Platform adapters subscribe and relay events to clients via their transport (WebSocket, SSE, etc.). + +```typescript +type EngineEvent = + | { type: 'message_start'; threadId: string; messageId: string; role: 'assistant' | 'system' } + | { type: 'text_delta'; threadId: string; text: string } + | { type: 'message_update'; threadId: string; messageId: string; parts: MessagePart[]; content?: string } + | { type: 'message_end'; threadId: string; messageId: string; reason: 'end_turn' | 'error' | 'abort' } + | { type: 'tool_start'; threadId: string; tool: string; args: Record } + | { type: 'tool_end'; threadId: string; tool: string; result: string; isError: boolean } + | { type: 'turn_end'; threadId: string; reason: 'end_turn' | 'error' | 'abort' } + | { type: 'thread_start'; threadId: string; parentThreadId?: string } + | { type: 'queue_state'; threadId: string; state: QueueState } + | { type: 'compaction_start' | 'compaction_end'; threadId: string } + | { type: 'task_start' | 'task_end'; childSessionId: string; threadId: string } + | { type: 'status'; threadId: string; status: 'idle' | 'queued' | 'thinking' | 'tool_calling' | 'streaming' | 'blocked_on_decision_gate' } + | { type: 'error'; threadId?: string; code: string; error: string; recoverable: boolean } + | { type: 'decision_gate'; threadId: string; gate: DecisionGate } + | { type: 'decision_gate_resolved'; threadId: string; gateId: string; resolution: DecisionResolution } + | { type: 'decision_gate_expired'; threadId: string; gateId: string } + | { type: 'decision_gate_withdrawn'; threadId: string; gateId: string; reason: 'steer' | 'abort' | 'cancel' } + | { type: 'model_switched'; threadId: string; fromModel: string; toModel: string; reason: string } +``` + +The engine does not know about WebSockets, SSE, or any transport. It emits events; the adapter decides delivery. + +### Client Event Contract + +Clients consume decision-gate events directly. Adapters may deliver these events over WebSocket or SSE, but payloads are identical. + +```typescript +type ClientEvent = + | { type: 'init'; session: SessionData; threads: ThreadData[]; queue: QueueState[]; pendingDecisionGates: DecisionGate[] } + | { type: 'message'; sessionId: string; threadId: string; entry: MessageEntry } + | { type: 'message.updated'; sessionId: string; threadId: string; entryId: string; patch: Partial } + | { type: 'chunk'; sessionId: string; threadId: string; messageId: string; content: string } + | { type: 'agentStatus'; sessionId: string; threadId: string; status: EngineEventStatus; detail?: string } + | { type: 'queue.state'; sessionId: string; threadId: string; queue: QueueState } + | { type: 'decision_gate'; sessionId: string; threadId: string; gate: DecisionGate } + | { type: 'decision_gate_resolved'; sessionId: string; threadId: string; gateId: string; resolution: DecisionResolution } + | { type: 'decision_gate_expired'; sessionId: string; threadId: string; gateId: string } + | { type: 'decision_gate_withdrawn'; sessionId: string; threadId: string; gateId: string; reason: DecisionWithdrawReason } + | { type: 'error'; sessionId?: string; threadId?: string; code: string; message: string; recoverable: boolean }; + +type EngineEventStatus = + | 'idle' + | 'queued' + | 'thinking' + | 'tool_calling' + | 'streaming' + | 'blocked_on_decision_gate' + | 'error'; +``` + +Clients resolve a gate by calling the decision API route, not by sending transport-specific answer messages: + +```http +POST /api/sessions/:sessionId/decision-gates/:gateId/resolve +POST /api/sessions/:sessionId/decision-gates/:gateId/withdraw +``` + +Adapters must include all pending decision gates in the initial connection payload so reconnecting clients can render outstanding approvals, questions, and credential requests without waiting for a replayed event. + +### Structured Results + +Optional schema-validated output extraction. Any prompt or skill invocation can pass a result schema (Valibot or TypeBox). The engine instructs the LLM to emit a result in a delimited block, extracts it, and validates against the schema. + +- Delimiters: `---RESULT_START---` and `---RESULT_END---` +- If validation fails and no delimiters found: auto-retry with a follow-up prompt +- Returns typed data matching the schema + +## Provider Interfaces + +These are the contracts that platform adapters implement. The engine depends only on these interfaces. + +### SandboxProvider + +Creates and manages sandbox compute. The engine calls this to get a Sandbox handle, then uses it for all file and process operations. + +```typescript +interface SandboxProvider { + create(opts: SandboxCreateOpts): Promise; + restore(id: string): Promise; + destroy(id: string): Promise; + status(id: string): Promise; +} + +interface SandboxCreateOpts { + image?: string; + workspace?: string; + env?: Record; + timeout?: number; + resources?: { cpu?: number; memory?: string }; + metadata?: Record; +} + +interface Sandbox { + id: string; + + // Filesystem + readFile(path: string): Promise; + readBinary(path: string): Promise; + writeFile(path: string, content: string): Promise; + writeBinary(path: string, data: Uint8Array): Promise; + readdir(path: string): Promise; + stat(path: string): Promise<{ isFile: boolean; isDirectory: boolean; size: number }>; + mkdir(path: string): Promise; + rm(path: string, opts?: { recursive?: boolean }): Promise; + + // Process execution + exec(command: string, opts?: ExecOpts): Promise; + + // Lifecycle + snapshot(): Promise; + tunnels(): Promise>; + destroy(): Promise; +} + +interface ExecOpts { + cwd?: string; + env?: Record; + timeout?: number; + signal?: AbortSignal; + stdin?: string; + maxOutputBytes?: number; +} + +interface ExecResult { + stdout: string; + stderr: string; + exitCode: number; + timedOut?: boolean; + truncated?: boolean; +} + +interface SandboxStatus { + id: string; + state: 'creating' | 'running' | 'stopped' | 'error'; + startedAt?: number; + error?: string; +} +``` + +**Implementations:** +- `ModalSandbox` — wraps Modal's Python SDK (called via HTTP to the Modal backend) +- `K8sPodSandbox` — creates a K8s pod, exec via K8s API +- `DockerSandbox` — local Docker container (dev/testing) +- `LocalSandbox` — host filesystem + child_process (CI, local dev) +- `VirtualSandbox` — in-memory filesystem + just-bash (lightweight agents, no container) + +#### Sandbox RPC Contract + +Remote sandbox implementations expose an authenticated HTTP RPC surface to the adapter. The engine still calls the `Sandbox` TypeScript interface; this RPC is the required adapter-to-sandbox protocol for Modal and Kubernetes implementations. + +All requests include `Authorization: Bearer `. Tokens are scoped to one session and one sandbox ID. Paths are relative to the sandbox workspace unless explicitly absolute and allowed by adapter policy. + +| Method | Path | Request | Response | +|---|---|---|---| +| `GET` | `/health` | none | `{ ok: true, sandboxId, version }` | +| `GET` | `/files/stat?path=` | none | `{ isFile, isDirectory, size, mtimeMs }` | +| `GET` | `/files/read?path=&encoding=utf8` | none | `{ content, encoding }` | +| `GET` | `/files/read-binary?path=` | none | binary stream | +| `PUT` | `/files/write` | `{ path, content, encoding?: 'utf8' }` | `{ ok: true }` | +| `PUT` | `/files/write-binary?path=` | binary body | `{ ok: true }` | +| `GET` | `/files/list?path=` | none | `{ entries: Array<{ name, type, size }> }` | +| `POST` | `/files/mkdir` | `{ path, recursive?: boolean }` | `{ ok: true }` | +| `DELETE` | `/files` | `{ path, recursive?: boolean }` | `{ ok: true }` | +| `POST` | `/exec` | `{ command, cwd?, env?, stdin?, timeout?, maxOutputBytes? }` | `ExecResult` | +| `POST` | `/snapshot` | none | `{ snapshotId }` | +| `GET` | `/tunnels` | none | `{ tunnels: Record }` | + +RPC implementations must enforce output limits, command timeouts, workspace path policy, and token validation. `exec` is non-interactive in V1; long-running interactive terminal sessions remain a sandbox UI concern exposed through tunnels, not an engine tool protocol. + +### SessionStore + +Persists session state, thread state, message history, and queue state. Used by both the engine (writes) and the API layer (reads). One implementation per database backend, shared by engine and API. + +```typescript +interface SessionStore { + // === Engine writes === + saveSession(session: SessionData): Promise; + saveThread(sessionId: string, thread: ThreadData): Promise; + appendEntries(sessionId: string, threadId: string, entries: SessionEntry[]): Promise; + /** + * Replace an existing entry in place. Required so pruning during + * compaction can persist tool-result elision; also useful for any + * other in-place mutation (gate refs, attachment updates). + * Throws NotFoundError if no entry with this id exists in (sessionId, threadId). + */ + updateEntry(sessionId: string, threadId: string, entry: SessionEntry): Promise; + saveQueueState(sessionId: string, threadId: string, queue: QueueState): Promise; + saveDecisionGate(sessionId: string, threadId: string, gate: DecisionGate): Promise; + saveDecisionGateRef(sessionId: string, threadId: string, gateId: string, ref: { channelType: string; ref: DecisionGateRef }): Promise; + updateDecisionGateEntry(sessionId: string, threadId: string, gateId: string, patch: Partial): Promise; + saveSuspendedTurn(sessionId: string, threadId: string, suspended: SuspendedTurnState): Promise; + clearSuspendedTurn(sessionId: string, threadId: string): Promise; + updateSessionStatus(id: string, status: string, metadata?: Partial): Promise; + flush?(): Promise; + + // === API reads === + getSession(id: string): Promise; + listSessions(userId: string, opts?: ListOpts): Promise; + getThread(sessionId: string, threadId: string): Promise; + listThreads(sessionId: string): Promise; + getEntries(sessionId: string, threadId: string, opts?: MessageQuery): Promise; + listDecisionGates(sessionId: string, threadId?: string): Promise; + getSuspendedTurn(sessionId: string, threadId: string): Promise; + + // === Shared === + deleteSession(id: string): Promise; +} +``` + +```typescript +interface SuspendedTurnState { + sessionId: string; + threadId: string; + queueItemId: string; + gateId: string; + model: string; + leafMessageId?: string; + toolCallId: string; + toolName: string; + toolArgs: Record; + resumeKey: string; + attempt: number; + createdAt: number; +} +``` + +```typescript +interface SessionData { + id: string; + userId: string; + orgId: string; + workspace: string; + purpose: 'interactive' | 'orchestrator' | 'workflow' | 'child'; + status: 'initializing' | 'running' | 'paused' | 'hibernated' | 'terminated' | 'error'; + sandboxId?: string; + snapshotId?: string; + parentSessionId?: string; + metadata?: Record; + createdAt: number; + updatedAt: number; +} + +interface ThreadData { + id: string; + sessionId: string; + key: string; + status: 'active' | 'paused' | 'archived'; + activeLeafEntryId?: string; + queueMode: QueueMode; + model?: string; + summary?: string; + metadata?: Record; + createdAt: number; + updatedAt: number; +} + +interface QueueState { + threadId: string; + mode: QueueMode; + status: 'idle' | 'queued' | 'running' | 'blocked_on_decision_gate' | 'paused'; + activeItemId?: string; + pending: QueueItem[]; + collectBuffer?: QueueItem[]; + blockedGateId?: string; +} + +interface QueueItem { + id: string; + threadId: string; + content: PromptContent; + author?: PromptAuthor; + channel?: ChannelTarget; + replyTarget?: ChannelTarget; + model?: string; + metadata?: Record; + createdAt: number; +} + +type SessionEntry = + | MessageEntry + | DecisionGateEntry + | CompactionEntry + | BranchSummaryEntry; +``` + +**Data flow:** The engine writes through SessionStore during execution. The API layer reads through SessionStore for client queries (session lists, message history, etc.). Both hit the same underlying database. The engine is the writer, the API is the reader, the database is the shared state. + +Hot/cold storage tiering (e.g., DO SQLite as write-through cache for D1) is an implementation detail of the SessionStore, not a concern of the engine or API layer. The `flush()` method is called by the engine on session shutdown, giving the store a chance to drain any internal buffers. + +**Implementations:** +- `D1SessionStore` — Cloudflare D1 via Drizzle +- `PostgresSessionStore` — PostgreSQL via Drizzle +- `InMemorySessionStore` — for tests and ephemeral agents + +#### Required Tables + +The engine schema owns these tables. Existing application tables may mirror selected fields during rollout, but the engine must not depend on current `messages`, `session_threads`, or DO-local queue tables for correctness. + +| Table | Purpose | Key fields | +|---|---|---| +| `engine_sessions` | Canonical engine session state | `id`, `user_id`, `org_id`, `workspace`, `purpose`, `status`, `sandbox_id`, `snapshot_id`, `metadata`, timestamps | +| `engine_threads` | Thread metadata and active leaf | `id`, `session_id`, `key`, `status`, `active_leaf_entry_id`, `queue_mode`, `model`, `summary`, `metadata` | +| `engine_entries` | DAG history | `id`, `session_id`, `thread_id`, `parent_id`, `entry_type`, `role`, `content`, `parts`, `metadata`, `created_at` | +| `engine_queue_items` | Persisted per-thread queue | `id`, `session_id`, `thread_id`, `status`, `mode`, `content`, `author`, `channel`, `reply_target`, `model`, `metadata`, timestamps | +| `engine_decision_gates` | Pending and terminal gate state | `id`, `session_id`, `thread_id`, `type`, `status`, `title`, `body`, `actions`, `origin`, `context`, `resolution`, `expires_at`, timestamps | +| `engine_decision_gate_refs` | Delivered channel refs | `id`, `gate_id`, `channel_type`, `ref`, `created_at`, `updated_at` | +| `engine_suspended_turns` | Restart-safe blocked turn checkpoints | `session_id`, `thread_id`, `queue_item_id`, `gate_id`, `model`, `leaf_entry_id`, `tool_call_id`, `tool_name`, `tool_args`, `resume_key`, `attempt`, `created_at` | +| `engine_credentials` | Stored credentials when adapter uses engine schema | `id`, `owner_type`, `owner_id`, `service`, `credential_type`, `encrypted_data`, `scopes`, `expires_at`, timestamps | +| `engine_oauth_states` | OAuth handshake state | `state`, `user_id`, `service`, `redirect_uri`, `code_verifier`, `metadata`, `expires_at` | + +Indexes are required on `(session_id, thread_id, created_at)` for entries, `(session_id, thread_id, status)` for queue items and gates, and `(owner_type, owner_id, service)` for credentials. + +### EventBus + +Broadcasts engine events to external subscribers (clients, other services). The engine pushes events; the adapter subscribes and relays to clients. + +```typescript +interface EventBus { + publish(event: BusEvent): Promise; + subscribe(filter: EventFilter, callback: (event: BusEvent) => void): Unsubscribe; +} + +interface BusEvent { + sessionId: string; + threadId?: string; + userId?: string; + event: EngineEvent; + timestamp: number; +} + +interface EventFilter { + sessionId?: string; + userId?: string; + eventTypes?: string[]; +} + +type Unsubscribe = () => void; +``` + +**Implementations:** +- `DOEventBus` — posts to a thin EventBus Durable Object +- `RedisEventBus` — Redis pub/sub channels per session/user +- `InMemoryEventBus` — direct callback (single-process, tests) + +### Channel Transports + +Channel transports are in scope for V1 at the adapter boundary. The engine does not render Slack or Telegram payloads directly, but it does define the decision-gate and reply-routing contract that transports must implement. + +```typescript +interface ChannelTransport { + readonly channelType: string; + + verifySignature?(headers: Record, rawBody: string, secret?: string): boolean | Promise; + parseInbound?(headers: Record, rawBody: string, ctx: ChannelTransportContext): Promise; + + sendMessage(target: ChannelTarget, message: OutboundMessage, ctx: ChannelTransportContext): Promise; + updateMessage?(target: ChannelTarget, ref: ChannelMessageRef, message: OutboundMessage, ctx: ChannelTransportContext): Promise; + + sendDecisionGate?(target: ChannelTarget, gate: DecisionGate, ctx: ChannelTransportContext): Promise; + updateDecisionGate?(target: ChannelTarget, ref: DecisionGateRef, update: DecisionGateUpdate, ctx: ChannelTransportContext): Promise; + + parseInboundDecision?(payload: unknown, ctx: ChannelTransportContext): Promise<{ + gateId: string; + actionId?: string; + value?: string; + actorExternalId?: string; + } | null>; +} + +interface ChannelTarget { + channelType: string; + channelId: string; + threadId?: string; +} + +interface ChannelTransportContext { + userId: string; + orgId: string; + sessionId: string; + threadId?: string; + token?: string; + botToken?: string; + persona?: { + name?: string; + avatar?: string; + metadata?: Record; + }; + metadata?: Record; +} + +interface OutboundMessage { + text?: string; + markdown?: string; + attachments?: Array<{ + type: 'image' | 'file'; + url: string; + mimeType: string; + name?: string; + caption?: string; + }>; + replyTo?: ChannelMessageRef; + metadata?: Record; +} + +interface ChannelMessageRef { + messageId: string; + channelId: string; + threadId?: string; + [key: string]: unknown; +} + +type DecisionGateUpdate = + | { status: 'resolved'; resolution: DecisionResolution } + | { status: 'expired' } + | { status: 'withdrawn'; reason: DecisionWithdrawReason }; + +type InboundChannelEvent = + | { type: 'message'; target: ChannelTarget; text: string; actor: ChannelActor; messageId?: string; attachments?: PromptAttachment[] } + | { type: 'decision'; gateId: string; actionId?: string; value?: string; actor: ChannelActor; target?: ChannelTarget; messageId?: string }; + +interface ChannelActor { + id: string; + displayName?: string; + email?: string; +} +``` + +**Slack is the required reference transport for V1.** The V1 implementation must define: + +- how a Slack thread maps to `channelType = 'slack'` and a stable `channelId` +- how Slack button clicks map back to `gateId` / `actionId` +- how free-text thread replies resolve pending decision gates when the stored origin matches +- how previously sent Slack decision gates are updated on resolution, expiry, or withdrawal + +Other transports may follow the same contract later, but Slack is the minimum transport that must be fully specified and implemented for V1. + +Slack `channelId` is canonicalized as `teamId:channelId:threadTs` for thread replies and `teamId:channelId` for channel-level messages. The transport may store native Slack fields (`ts`, `thread_ts`, `response_url`) inside `DecisionGateRef`, but engine-visible routing always uses the canonical `ChannelTarget`. + +### BlobStore + +File attachments, images, artifacts. Simple key-value with streaming. + +```typescript +interface BlobStore { + put(key: string, data: Uint8Array | ReadableStream, opts?: { contentType?: string }): Promise; + get(key: string): Promise<{ data: ReadableStream; contentType?: string } | null>; + delete(key: string): Promise; +} +``` + +**Implementations:** +- `R2BlobStore` — Cloudflare R2 +- `S3BlobStore` — AWS S3 / MinIO + +### CredentialStore + +Stores OAuth tokens and API keys per user per service. Handles encryption transparently within the implementation: the engine passes an encryption key via adapter config, the store encrypts/decrypts tokens internally. The engine and tools never see encrypted blobs. + +```typescript +interface CredentialStore { + get(owner: CredentialOwner, service: string): Promise; + save(owner: CredentialOwner, service: string, credential: StoredCredential): Promise; + delete(owner: CredentialOwner, service: string): Promise; + list(owner: CredentialOwner): Promise<{ service: string; scopes?: string[]; connectedAt: string }[]>; +} + +interface CredentialOwner { + type: 'user' | 'org' | 'session'; + id: string; +} + +interface StoredCredential { + type: 'oauth2' | 'api_key' | 'bot_token' | 'service_account' | 'app_install'; + accessToken?: string; + refreshToken?: string; + apiKey?: string; + expiresAt?: number; + scopes?: string[]; + metadata?: Record; +} +``` + +**Token refresh:** When a credential's `expiresAt` is in the past (or within a configurable buffer), the CredentialProvider wrapper in the engine auto-refreshes using the OAuth provider's token endpoint before returning the token to the tool. This requires OAuthProviderConfig for the service (token URL, client credentials). Transparent to the tool. + +**OAuth flow:** OAuth connection flows (user initiates "Connect GitHub" from the UI) live in the API layer. The API handles redirect, callback, and token exchange, then stores the credential via CredentialStore. The engine consumes stored credentials at tool execution time. + +**OAuth provider registry:** Plugin packages export their OAuth configuration alongside their tools: + +```typescript +interface OAuthProviderConfig { + service: string; + authorizeUrl: string; + tokenUrl: string; + scopes: string[]; + clientId: string; + clientSecret: string; + refreshable: boolean; +} +``` + +The API layer collects these at startup to power the OAuth connection UI and callback handling. + +Credential lookup order is tool-defined but must be explicit. The default order is session-scoped credential, user credential, org credential. If no credential is found and the tool requires one, `CredentialProvider.request()` creates a `DecisionGate` of type `credential_request`. + +## Schema and Migrations + +The engine owns the canonical database schema. Schema definitions live in the engine package as Drizzle TypeScript schemas. Migration files are generated per dialect (SQLite for D1, PostgreSQL for PG) and ship with the engine package. + +``` +packages/engine/ + src/schema/ ← Drizzle schema definitions (source of truth) + migrations/ + sqlite/ ← generated by drizzle-kit for D1 + postgresql/ ← generated by drizzle-kit for PG +``` + +**Schema coverage:** The engine schema defines tables for sessions, threads, message entries, queue state, decision gates, suspended turns, credentials, and OAuth states. This is the same schema the SessionStore and CredentialStore implementations read from and write to. + +**Workflow for adding a field:** +1. Update the Drizzle schema in `packages/engine/src/schema/` +2. Run `drizzle-kit generate` for each dialect — produces migration SQL +3. Migration files ship with the engine package +4. On deploy, each platform applies migrations through its normal mechanism: + - Cloudflare: `wrangler d1 migrations apply` + - Kubernetes: init container or migration job running `drizzle-kit migrate` + +The SessionStore interface has no `migrate()` method. Migrations are a deployment concern, not a runtime interface. The engine is a library; it does not own the deployment lifecycle. + +### Current Schema Coexistence + +During rollout, engine tables live beside current application tables. The Cloudflare adapter may mirror engine data into existing tables used by the current client, analytics, and admin views, but the engine source of truth is always the `engine_*` schema. + +Required mirroring during the transition: + +- `engine_sessions` to current `sessions` for session lists and access control joins. +- `engine_threads` to current `session_threads` for thread lists. +- `engine_entries` message entries to current `messages` for existing history readers. +- `engine_decision_gates` to client event/API responses. No legacy decision-prompt table is created or written by the new engine path. + +The old DO-local prompt queue and decision storage are not part of the new runtime. Once the Cloudflare adapter is fully switched over, DO storage is limited to hosting concerns such as hibernation state and WebSocket bookkeeping. + +## Platform Adapters + +A platform adapter wires the engine to a specific deployment target. It does three things: + +1. Instantiates provider implementations (SessionStore, SandboxProvider, EventBus, BlobStore, CredentialStore) +2. Hosts the engine process (DO on CF, long-running process on K8s) +3. Provides the HTTP/WebSocket entrypoint for clients and API routes + +### Shared API Routes (`packages/api/`) + +API route handlers are written once and shared across platforms. They are Hono route factories parameterized by provider implementations: + +```typescript +export function sessionRoutes(store: SessionStore, engine: EngineManager) { + const router = new Hono(); + router.get('/:id', async (c) => { + const session = await store.getSession(c.req.param('id')); + return c.json(session); + }); + router.post('/:id/threads/:threadId/prompt', async (c) => { + const body = await c.req.json(); + await engine.getSession(c.req.param('id')) + .thread(c.req.param('threadId')) + .prompt(body.content); + return c.json({ ok: true }); + }); + return router; +} +``` + +Each adapter imports these factories and injects its providers. The route logic is written once. + +#### Required API Surface + +The shared API package owns route behavior. Adapters own authentication middleware, provider construction, and request context injection. + +| Method | Route | Behavior | +|---|---|---| +| `POST` | `/api/sessions` | Create a session and return session metadata plus client stream URL | +| `GET` | `/api/sessions/:sessionId` | Read session metadata and live status | +| `DELETE` | `/api/sessions/:sessionId` | Terminate and delete/archival-mark a session | +| `POST` | `/api/sessions/:sessionId/prompt` | Prompt the default thread | +| `GET` | `/api/sessions/:sessionId/threads` | List threads | +| `POST` | `/api/sessions/:sessionId/threads` | Create a thread | +| `GET` | `/api/sessions/:sessionId/threads/:threadId` | Read thread metadata and entries | +| `POST` | `/api/sessions/:sessionId/threads/:threadId/prompt` | Prompt a specific thread | +| `POST` | `/api/sessions/:sessionId/threads/:threadId/abort` | Abort current turn and clear this thread queue | +| `POST` | `/api/sessions/:sessionId/threads/:threadId/pause` | Pause this thread | +| `POST` | `/api/sessions/:sessionId/threads/:threadId/resume` | Resume this thread | +| `GET` | `/api/sessions/:sessionId/decision-gates` | List pending and recent terminal gates | +| `POST` | `/api/sessions/:sessionId/decision-gates/:gateId/resolve` | Resolve a pending gate | +| `POST` | `/api/sessions/:sessionId/decision-gates/:gateId/withdraw` | Withdraw a pending gate | +| `GET` | `/api/sessions/:sessionId/events` | SSE stream for client events | +| `GET` | `/api/sessions/:sessionId/ws` | WebSocket stream for client events and optional prompt/control messages | +| `GET` | `/api/sessions/:sessionId/tunnels` | Return sandbox tunnel URLs | +| `POST` | `/api/sessions/:sessionId/snapshot` | Snapshot session sandbox and persist snapshot ID | + +Prompt routes accept the same `PromptOptions` shape as the engine API. WebSocket prompt/control messages are optional conveniences over the same route semantics; they must not define separate behavior. + +### Cloudflare Adapter (`packages/adapter-cloudflare/`) + +``` +Cloudflare Worker (Hono) + ├── API routes (shared from packages/api/) + │ └── reads/writes via D1SessionStore + │ + ├── WebSocket upgrade → subscribes to DOEventBus → relays to client + │ + └── Session operations → SessionHostDO + │ + SessionHostDO (thin shell, ~100 lines) + ├── creates Engine instance on first request + ├── injects: D1SessionStore, ModalSandbox, DOEventBus, R2BlobStore + ├── forwards prompt/abort/pause/resume to engine + └── engine runs agent loop, emits events, writes state +``` + +The SessionHostDO is a thin shell. It creates an engine instance with CF provider implementations, forwards incoming requests, and uses DO hibernation so idle sessions don't consume compute. On wake, it restores the engine from SessionStore state. + +### Kubernetes Adapter (`packages/adapter-k8s/`) + +``` +K8s Service (Hono/Node) + ├── API routes (shared from packages/api/) + │ └── reads/writes via PostgresSessionStore + │ + ├── WebSocket upgrade → subscribes to RedisEventBus → relays to client + │ + └── Session operations → SessionPool + │ + SessionPool (process manager) + ├── spawns/reuses engine instances per session + ├── injects: PostgresSessionStore, ModalSandbox, RedisEventBus, S3BlobStore + ├── forwards prompt/abort/pause/resume to engine + └── engine runs in-process +``` + +The SessionPool manages engine instances in-process. Idle instances are evicted after a timeout (equivalent to DO hibernation). Session affinity via K8s ingress routes requests for the same session to the same pod. + +### What Each Adapter Provides + +| Interface | Cloudflare | Kubernetes | +|---|---|---| +| SessionStore | D1 via Drizzle | PostgreSQL via Drizzle | +| SandboxProvider | Modal SDK | Modal SDK / K8s Pod API | +| EventBus | DO singleton | Redis pub/sub | +| BlobStore | R2 | S3 / MinIO | +| CredentialStore | D1 (encrypted) | PostgreSQL (encrypted) | +| Channel transports | Worker-integrated (Slack required for V1) | Service-integrated (Slack required for V1) | +| Engine host | SessionHostDO | SessionPool (in-process) | + +### Adapter Host Contract + +Every adapter must provide: + +- Request authentication and authorization before calling shared API route handlers. +- Provider construction for the current deployment target. +- Engine instance lookup by session ID. +- Session affinity so prompts, decision resolutions, and aborts for one session reach the same active engine instance. +- Event subscription and client delivery over WebSocket and/or SSE. +- Startup restoration of queued, running, and blocked threads from `SessionStore` via `engine.restoreSession({ sessionId, options })`. The adapter is responsible for reconstructing `options` (tools, sandbox handle, model, system prompt, role/skill sources) from its own configuration — the engine itself does not persist creation options. +- Idle eviction/hibernation that calls `store.flush()` and leaves enough persisted state to resume. Specifically: any thread with status `running` or `blocked_on_decision_gate`, plus its active queue item and (for blocked threads) its `SuspendedTurnState`, must be readable on wake. +- Fatal error handling that marks the session `error`, publishes a client `error` event, and prevents silent queue accumulation. + +Cloudflare V1 uses one `SessionHostDO` per session ID. Kubernetes may use a process-local `SessionPool`, but must provide equivalent session affinity and restore behavior. + +## Tool Implementation and Integration Framework + +### Plugin Package Structure + +Plugin packages live in `packages/plugin-*/`. Each exports tools as `ToolDef[]` and optionally exports OAuth configuration. + +```typescript +// packages/plugin-github/src/tools.ts +import type { ToolDef } from '@valet/engine'; + +export const tools: ToolDef[] = [ + { + name: 'github.create_pr', + description: 'Create a pull request on GitHub', + parameters: Type.Object({ + repo: Type.String(), + title: Type.String(), + body: Type.String(), + head: Type.String(), + base: Type.String(), + }), + execute: async (args, ctx) => { + const cred = await ctx.credentials.get('github'); + if (!cred) { + await ctx.credentials.request('github', 'Need GitHub access to create a PR'); + } + const res = await fetch(`https://api.github.com/repos/${args.repo}/pulls`, { + method: 'POST', + headers: { Authorization: `Bearer ${cred.accessToken}` }, + body: JSON.stringify(args), + }); + const pr = await res.json(); + return { text: `Created PR #${pr.number}: ${pr.html_url}` }; + }, + }, +]; + +// packages/plugin-github/src/oauth.ts +import type { OAuthProviderConfig } from '@valet/engine'; + +export const oauth: OAuthProviderConfig = { + service: 'github', + authorizeUrl: 'https://github.com/login/oauth/authorize', + tokenUrl: 'https://github.com/login/oauth/access_token', + scopes: ['repo', 'read:org'], + clientId: process.env.GITHUB_CLIENT_ID!, + clientSecret: process.env.GITHUB_CLIENT_SECRET!, + refreshable: false, +}; +``` + +### Tool Registration + +Tools from plugin packages are registered at session creation. The adapter collects tools from all enabled plugins and passes them to the engine: + +```typescript +import { tools as githubTools } from '@valet/plugin-github'; +import { tools as slackTools } from '@valet/plugin-slack'; +import { tools as linearTools } from '@valet/plugin-linear'; + +const session = await engine.createSession({ + sandbox: await sandboxProvider.create({ image, workspace }), + tools: [...githubTools, ...slackTools, ...linearTools], + // ... +}); +``` + +The engine merges plugin tools with built-in tools. Name conflicts between plugins are caught at registration time. Per-thread tool overrides are merged at prompt time (thread-level wins on name conflict). + +### Engine-to-Tool Data Flow + +``` +Engine receives tool call from LLM + → looks up ToolDef by name + → constructs ToolContext { userId, orgId, sessionId, threadId, channel metadata, repo context, credentials, sandbox, signal } + → calls toolDef.execute(args, ctx) + → tool uses ctx.credentials.get('service') for API auth + → tool uses ctx.sandbox.exec() / readFile() if it needs sandbox access + → tool may call ctx.requestDecision(...) for gated human input + → tool returns ToolResult { text, attachments? } + → engine handles attachments per type (vision, blob store, inline) + → engine feeds result back to LLM via pi-agent-core +``` + +## Observability and Error Contract + +The engine distinguishes user-visible recoverable errors from fatal session errors. + +```typescript +interface EngineError { + code: string; + message: string; + recoverable: boolean; + sessionId?: string; + threadId?: string; + queueItemId?: string; + gateId?: string; + cause?: unknown; +} + +interface RuntimeMetric { + type: + | 'llm_call' + | 'tool_exec' + | 'queue_wait' + | 'turn_complete' + | 'decision_gate_wait' + | 'sandbox_exec' + | 'model_failover' + | 'compaction'; + sessionId: string; + threadId?: string; + durationMs?: number; + model?: string; + toolName?: string; + inputTokens?: number; + outputTokens?: number; + errorCode?: string; + properties?: Record; +} +``` + +Required behavior: + +- Recoverable thread errors emit an `error` event and mark the active queue item complete or failed. +- Fatal session errors update session status to `error`, flush state, and prevent new prompts until restored or restarted. +- Every model call emits token/cost metadata when available. +- Every tool call emits duration and success/failure metadata. +- Decision gates measure wait duration from creation to terminal state. +- Logs may contain IDs and high-level errors, but must not contain secrets, OAuth tokens, command environment secrets, or full credential payloads. + +## Implementation Direction + +### Reference Flue, Build In-Repo + +Valet V1 will build its own engine in-repo and may borrow ideas or implementation patterns from Flue, but will not depend directly on `@flue/sdk` as the runtime substrate. + +Reasons: +- Full control over the agent loop, compaction, and threading model +- First-class support for multi-threaded sessions rather than single-active-operation sessions +- First-class decision-gated execution rather than Flue's headless-only default +- Channel-aware routing and adapter contracts for web, Slack, Telegram, and orchestrator threads +- Richer tool context and persistence contracts aligned with Valet's integrations and multiplayer model + +Flue remains a useful reference implementation for: + +- session/runtime structure around `pi-ai` and `pi-agent-core` +- sandbox abstraction +- built-in filesystem/shell/task tools +- DAG-style history and compaction patterns +- Cloudflare adapter and persistence patterns + +This is a settled V1 decision, not an open implementation choice. The engine package will be built in-repo and may reference Flue code where useful, but Valet owns the runtime contracts and implementation. diff --git a/packages/engine/README.md b/packages/engine/README.md new file mode 100644 index 00000000..90f8b68d --- /dev/null +++ b/packages/engine/README.md @@ -0,0 +1,109 @@ +# @valet/engine + +Prototype implementation of the portable runtime engine described in +[`docs/specs/2026-05-02-portable-runtime-engine-design.md`](../../docs/specs/2026-05-02-portable-runtime-engine-design.md). + +This is the V1 in-repo engine library. It runs the agent loop, owns +session/thread state, executes tools, and emits typed events. Platform +adapters (Cloudflare, Kubernetes) host this library; this package itself +has zero platform dependencies. + +## What works in this prototype + +- Engine public API: `createSession`, `restoreSession({ sessionId, options })`, + `getSession`, `deleteSession`, `Session.prompt`, `Session.thread()`, + `Session.resolveDecision`, `Session.withdrawDecision`, + `Session.abort/pause/resume`. +- Per-thread state: each thread gets its own `pi-agent-core` `Agent` + instance with its own queue and DAG history. +- Per-thread queue modes: `followup` (FIFO), `steer` (abort + start), + `collect` (buffered window). +- Decision gates: tool calls `ctx.requestDecision({...})` to suspend the + turn. The gate is persisted, a `DecisionGateEntry` lands in the DAG, the + engine emits `decision_gate`, and the turn resumes when the user calls + `session.resolveDecision()`. Pending gates withdraw on `steer` or + `abort` and expire after `expiresAt`. +- **Restart-safe re-entrant decision gates.** Gate IDs are deterministic: + `gate:{sessionId}:{threadId}:{queueItemId}:{resumeKey}`. Tools must + supply a stable `resumeKey`. On `restoreSession`, the engine re-arms + pending gates and replays the suspended tool with `ctx.suspendedDecision` + populated; `requestDecision` short-circuits and returns the stored + resolution instead of opening a new gate. Validated by an end-to-end + test that opens a gate, throws away the engine, builds a new one on the + same store, calls `restoreSession`, then `resolveDecision`, and verifies + the agent's continuation message is persisted. +- Multi-thread: threads run concurrently, share the sandbox, and have + isolated histories. Aborting one thread doesn't affect siblings. +- Built-in `thread_read` tool: a thread can read recent messages from a + sibling, parent, or child thread. +- Built-in tools: `read`, `write`, `edit`, `bash`, `thread_read`. +- **Persistent SessionStore.** `SqliteSessionStore` (Drizzle SQLite schema, + migrations, in-process via `better-sqlite3`) implements the same + `SessionStore` interface as `InMemorySessionStore`. Both pass an + identical 10-test contract suite. Schema mirrors the V1 spec's required + tables: `engine_sessions`, `engine_threads`, `engine_entries`, + `engine_queue_state`, `engine_decision_gates`, + `engine_decision_gate_refs`, `engine_suspended_turns`, plus stubbed + `engine_queue_items` for future per-item visibility. +- In-memory providers: `InMemorySessionStore`, `InMemoryEventBus`, + `InMemoryBlobStore`, `InMemoryCredentialStore`, `VirtualSandbox` / + `VirtualSandboxProvider` (in-memory FS + a small whitelist of safe shell + commands). These double as test fixtures. + +## What's deferred (post-prototype) + +- **D1 wiring.** `SqliteSessionStore` uses `better-sqlite3`. The Cloudflare + adapter will reuse the same Drizzle queries through `drizzle-orm/d1`. +- **Postgres dialect mirror.** The K8s adapter contract requires a + pg-core schema mirror. Same logical schema, different column helpers; + doable in one task once the K8s adapter is on deck. +- **Per-queue-item rows.** Today the active and pending queue items are + persisted via the JSON-encoded `engine_queue_state.pending` column. + `engine_queue_items` exists as a schema stub; populating it gives the + adapter visibility into individual items but isn't a correctness + requirement. +- **Compaction.** Token-aware context compression is not implemented. + `CompactionEntry` is in the DAG schema; the algorithm itself is a + follow-up. +- **Roles & skills loading.** The types are defined, but role and skill + resolution at prompt time is not wired in. +- **Model failover.** Single-model only for now. +- **Plugin Action Bridge.** The `actionSourceToTools` adapter described + in the spec is not implemented yet — plugins should currently export + `ToolDef[]` directly. +- **Structured results.** Schema-validated output extraction with + `---RESULT_START---` delimiters is not implemented. + +## Spec-vs-reality deltas (notes from the pi-ai/pi-agent-core spike) + +The spec was written before pinning the API surface of `pi-ai` / +`pi-agent-core`. The implementation reconciles: + +1. **`ToolDef.execute(args, ctx)` vs `AgentTool.execute(toolCallId, params, signal, onUpdate)`.** + The spec keeps the spec-faithful `ToolDef` shape as the public type; + internally we wrap each `ToolDef` to a pi `AgentTool` via + `tool-bridge.ts` and capture `ToolContext` in a closure. +2. **No native turn-suspension primitive.** pi-agent-core's + `beforeToolCall` can `{ block: true }` (deny path) but doesn't pause. + For a "wait for human" gate we await a Promise inside the tool. +3. **`message_start` vs `message_update`.** pi-agent-core fires + `message_start` once per assistant message; `message_update` carries + delta events (text, thinking, tool calls). The engine subscribes to + both. +4. **Custom `AgentMessage` types via `convertToLlm`.** The engine could + later persist `DecisionGateEntry` etc. as custom AgentMessages + alongside the LLM transcript, then filter them out before each LLM + call. We don't need this in the prototype because we persist via the + SessionStore directly, but the pattern is useful when we want + in-context awareness of past gates. + +## Tests + +```sh +pnpm --filter @valet/engine test +``` + +Covers: happy path (3), decision gates (4), queue modes (4), +multi-thread + thread_read (3), short-circuit predicate unit tests (6), +SessionStore contract suite × 2 backends (20), full restart cycle (1) — +41 tests total. diff --git a/packages/engine/bin/repl.ts b/packages/engine/bin/repl.ts new file mode 100644 index 00000000..536e8362 --- /dev/null +++ b/packages/engine/bin/repl.ts @@ -0,0 +1,265 @@ +#!/usr/bin/env -S node --import tsx +/** + * End-to-end smoke REPL for @valet/engine. + * + * Wires up: + * - InMemorySessionStore + InMemoryEventBus + * - VirtualSandbox (default) or LocalSandbox (real host filesystem + shell) + * - The engine's built-in tools (read/write/edit/bash/thread_read) + * - A real Anthropic model via pi-ai (defaults to claude-haiku-4-5) + * + * Env: + * ANTHROPIC_API_KEY required + * VALET_MODEL pi-ai anthropic model id (default claude-haiku-4-5) + * VALET_SANDBOX virtual | local (default virtual) + * VALET_WORKSPACE workspace dir for local sandbox (default cwd) + * VALET_SYSTEM_PROMPT override the system prompt + * GITHUB_TOKEN when set, registers @valet/plugin-github actions via + * the actionSourceToTools bridge (read/write GitHub) + * VALET_CONTEXT_WINDOW override the model's local contextWindow (forces + * compaction at a smaller budget for dogfooding) + * VALET_MAX_TOKENS override the model's local maxTokens + * + * Usage: + * + * # in-memory sandbox, single prompt: + * pnpm --filter @valet/engine repl "say hi" + * + * # local sandbox pointed at the current repo, interactive: + * VALET_SANDBOX=local pnpm --filter @valet/engine repl + * + * # local sandbox pointed at an explicit dir: + * VALET_SANDBOX=local VALET_WORKSPACE=/path/to/repo pnpm --filter @valet/engine repl "list the top-level files" + */ +import { createInterface } from "node:readline/promises"; +import { stdin, stdout } from "node:process"; +import { resolve } from "node:path"; +import { getModel } from "@mariozechner/pi-ai"; +import { githubActions } from "@valet/plugin-github/actions"; +import { + actionBridgeTools, + Engine, + InMemoryCredentialStore, + InMemoryEventBus, + InMemorySessionStore, + LocalSandboxProvider, + VirtualSandboxProvider, + type ActionSourceConfig, + type BusEvent, + type SandboxProvider, + type Session, + type ToolDef, +} from "../src/index.js"; + +const MODEL_ID = process.env.VALET_MODEL ?? "claude-haiku-4-5"; +const SANDBOX_KIND = (process.env.VALET_SANDBOX ?? "virtual").toLowerCase(); +const WORKSPACE = + process.env.VALET_WORKSPACE ?? + (SANDBOX_KIND === "local" ? process.cwd() : "/"); + +const SYSTEM_PROMPT_VIRTUAL = + "You are a helpful coding assistant running inside an in-memory virtual sandbox. " + + "You have built-in tools: read, write, edit, bash, thread_read. " + + "The sandbox starts empty at /. Be concise."; + +const SYSTEM_PROMPT_LOCAL = + `You are a helpful coding assistant running on a local developer machine. ` + + `Your workspace is ${WORKSPACE}. Relative paths resolve there. ` + + `You have built-in tools: read, write, edit, bash, thread_read. ` + + `Be concise. Confirm with the user before making destructive changes.`; + +const SYSTEM_PROMPT = + process.env.VALET_SYSTEM_PROMPT ?? + (SANDBOX_KIND === "local" ? SYSTEM_PROMPT_LOCAL : SYSTEM_PROMPT_VIRTUAL); + +function fail(message: string, code = 1): never { + process.stderr.write(`error: ${message}\n`); + process.exit(code); +} + +async function loadPluginTools(): Promise { + const sources: ActionSourceConfig[] = []; + if (process.env.GITHUB_TOKEN) { + sources.push({ service: "github", actions: githubActions }); + } + if (sources.length === 0) return []; + return actionBridgeTools({ sources }); +} + +async function buildSession(): Promise<{ session: Session; bus: InMemoryEventBus }> { + if (!process.env.ANTHROPIC_API_KEY) { + fail( + "ANTHROPIC_API_KEY is not set. Export it in your shell before running this REPL.", + ); + } + // pi-ai's `getModel` is typed against MODELS at compile time; we cast the + // env-supplied id at the boundary because it's user input. + const baseModel = getModel("anthropic", MODEL_ID as "claude-haiku-4-5"); + if (!baseModel) { + fail( + `unknown anthropic model "${MODEL_ID}". Check VALET_MODEL or pi-ai's MODELS table.`, + ); + } + // VALET_CONTEXT_WINDOW + VALET_MAX_TOKENS let us force compaction at low + // budgets for dogfooding — the engine uses these to compute `usable`, + // while Anthropic's API still accepts the real (much larger) context. + const overrideCtx = process.env.VALET_CONTEXT_WINDOW + ? parseInt(process.env.VALET_CONTEXT_WINDOW, 10) + : undefined; + const overrideMax = process.env.VALET_MAX_TOKENS + ? parseInt(process.env.VALET_MAX_TOKENS, 10) + : undefined; + const model = + overrideCtx || overrideMax + ? { + ...baseModel, + contextWindow: overrideCtx ?? baseModel.contextWindow, + maxTokens: overrideMax ?? baseModel.maxTokens, + } + : baseModel; + + const store = new InMemorySessionStore(); + const bus = new InMemoryEventBus(); + const credentials = new InMemoryCredentialStore(); + const sandboxProvider: SandboxProvider = + SANDBOX_KIND === "local" + ? new LocalSandboxProvider() + : new VirtualSandboxProvider(); + const engine = new Engine({ + providers: { store, bus, credentials, sandboxProvider }, + }); + + const userId = "repl-user"; + const tools: ToolDef[] = []; + + // Plugin sources: when their respective env tokens are set, save the + // credential and add the source to the bridge. The bridge then exposes + // a single (list_tools, call_tool) pair regardless of how many sources + // are wired in. + if (process.env.GITHUB_TOKEN) { + await credentials.save({ type: "user", id: userId }, "github", { + type: "oauth2", + accessToken: process.env.GITHUB_TOKEN, + }); + } + const pluginTools = await loadPluginTools(); + if (pluginTools.length > 0) { + tools.push(...pluginTools); + stdout.write( + `\x1b[90m[plugins] ${pluginTools.length} bridge tools (list_tools + call_tool)\x1b[0m\n`, + ); + } + + const workspace = SANDBOX_KIND === "local" ? resolve(WORKSPACE) : WORKSPACE; + const session = await engine.createSession({ + userId, + orgId: "repl-org", + workspace, + sandbox: { workspace }, + model, + systemPrompt: SYSTEM_PROMPT, + tools, + }); + + return { session, bus }; +} + +function subscribePrinter(bus: InMemoryEventBus): void { + bus.subscribe({}, (e: BusEvent) => { + const ev = e.event; + switch (ev.type) { + case "text_delta": + stdout.write(ev.text); + break; + case "tool_start": + stdout.write( + `\n\x1b[90m[tool] ${ev.tool}(${JSON.stringify(ev.args)})\x1b[0m\n`, + ); + break; + case "tool_end": + stdout.write( + `\x1b[90m[tool] ${ev.tool} -> ${ev.isError ? "ERROR" : "ok"}: ${truncate(ev.result, 200)}\x1b[0m\n`, + ); + break; + case "decision_gate": + stdout.write( + `\n\x1b[33m[gate] ${ev.gate.type}: ${ev.gate.title}\x1b[0m\n` + + ` id=${ev.gate.id}\n actions=${ev.gate.actions.map((a) => a.id).join(", ")}\n`, + ); + break; + case "decision_gate_resolved": + stdout.write(`\x1b[33m[gate] resolved=${ev.resolution.actionId}\x1b[0m\n`); + break; + case "compaction_start": + stdout.write(`\n\x1b[35m[compaction] started…\x1b[0m\n`); + break; + case "compaction_end": + stdout.write(`\x1b[35m[compaction] done\x1b[0m\n`); + break; + case "turn_end": + stdout.write(`\n\x1b[90m[turn ended: ${ev.reason}]\x1b[0m\n`); + break; + case "error": + stdout.write(`\n\x1b[31m[error] ${ev.code}: ${ev.error}\x1b[0m\n`); + break; + default: + if (process.env.VALET_DEBUG === "1") { + stdout.write(`\x1b[90m[debug] ${ev.type}\x1b[0m\n`); + } + break; + } + }); +} + +function truncate(s: string, max: number): string { + if (s.length <= max) return s; + return s.slice(0, max) + `…(+${s.length - max} chars)`; +} + +async function waitForIdle(bus: InMemoryEventBus, threadId: string): Promise { + return new Promise((resolve) => { + const unsub = bus.subscribe({}, (e) => { + if ( + e.event.type === "status" && + e.event.threadId === threadId && + e.event.status === "idle" + ) { + unsub(); + resolve(); + } + }); + }); +} + +async function runOneShot(prompt: string): Promise { + const { session, bus } = await buildSession(); + subscribePrinter(bus); + const receipt = await session.prompt(prompt); + await waitForIdle(bus, receipt.threadId); +} + +async function runInteractive(): Promise { + const { session, bus } = await buildSession(); + subscribePrinter(bus); + const rl = createInterface({ input: stdin, output: stdout }); + stdout.write( + `\nvalet engine repl — model=${MODEL_ID} sandbox=${SANDBOX_KIND}` + + (SANDBOX_KIND === "local" ? ` workspace=${WORKSPACE}` : "") + + `\ntype a prompt, 'exit' to quit.\n`, + ); + while (true) { + const line = (await rl.question("\n> ")).trim(); + if (line === "" ) continue; + if (line === "exit" || line === "quit") break; + const receipt = await session.prompt(line); + await waitForIdle(bus, receipt.threadId); + } + rl.close(); +} + +const args = process.argv.slice(2); +if (args.length > 0) { + await runOneShot(args.join(" ")); +} else { + await runInteractive(); +} diff --git a/packages/engine/drizzle.config.ts b/packages/engine/drizzle.config.ts new file mode 100644 index 00000000..ffc696b7 --- /dev/null +++ b/packages/engine/drizzle.config.ts @@ -0,0 +1,7 @@ +import { defineConfig } from "drizzle-kit"; + +export default defineConfig({ + dialect: "sqlite", + schema: "./src/schema/sqlite.ts", + out: "./migrations/sqlite", +}); diff --git a/packages/engine/migrations/sqlite/0000_lonely_lizard.sql b/packages/engine/migrations/sqlite/0000_lonely_lizard.sql new file mode 100644 index 00000000..644fea54 --- /dev/null +++ b/packages/engine/migrations/sqlite/0000_lonely_lizard.sql @@ -0,0 +1,137 @@ +CREATE TABLE `engine_decision_gate_refs` ( + `id` text PRIMARY KEY NOT NULL, + `gate_id` text NOT NULL, + `channel_type` text NOT NULL, + `ref` text NOT NULL, + `created_at` integer NOT NULL, + `updated_at` integer NOT NULL +); +--> statement-breakpoint +CREATE INDEX `engine_decision_gate_refs_gate` ON `engine_decision_gate_refs` (`gate_id`);--> statement-breakpoint +CREATE TABLE `engine_decision_gates` ( + `id` text PRIMARY KEY NOT NULL, + `session_id` text NOT NULL, + `thread_id` text NOT NULL, + `type` text NOT NULL, + `status` text NOT NULL, + `title` text NOT NULL, + `body` text, + `actions` text NOT NULL, + `origin` text, + `context` text, + `resolution` text, + `expires_at` integer, + `created_at` integer NOT NULL, + `updated_at` integer NOT NULL +); +--> statement-breakpoint +CREATE INDEX `engine_decision_gates_thread` ON `engine_decision_gates` (`session_id`,`thread_id`,`status`);--> statement-breakpoint +CREATE TABLE `engine_entries` ( + `id` text PRIMARY KEY NOT NULL, + `session_id` text NOT NULL, + `thread_id` text NOT NULL, + `parent_id` text, + `entry_type` text NOT NULL, + `role` text, + `content` text, + `parts` text, + `author` text, + `channel` text, + `model` text, + `summary` text, + `covered_entry_ids` text, + `token_count_before` integer, + `token_count_after` integer, + `file_context` text, + `branch_root_id` text, + `branch_leaf_id` text, + `gate_id` text, + `resolved_at` text, + `resolution` text, + `withdrawn_reason` text, + `metadata` text, + `created_at` integer NOT NULL +); +--> statement-breakpoint +CREATE INDEX `engine_entries_thread` ON `engine_entries` (`session_id`,`thread_id`,`created_at`);--> statement-breakpoint +CREATE INDEX `engine_entries_gate` ON `engine_entries` (`gate_id`);--> statement-breakpoint +CREATE TABLE `engine_queue_items` ( + `id` text PRIMARY KEY NOT NULL, + `session_id` text NOT NULL, + `thread_id` text NOT NULL, + `status` text NOT NULL, + `mode` text NOT NULL, + `content` text NOT NULL, + `author` text, + `channel` text, + `reply_target` text, + `model` text, + `metadata` text, + `created_at` integer NOT NULL, + `updated_at` integer NOT NULL +); +--> statement-breakpoint +CREATE INDEX `engine_queue_items_thread` ON `engine_queue_items` (`session_id`,`thread_id`,`status`);--> statement-breakpoint +CREATE TABLE `engine_queue_state` ( + `thread_id` text NOT NULL, + `session_id` text NOT NULL, + `mode` text NOT NULL, + `status` text NOT NULL, + `active_item_id` text, + `pending` text NOT NULL, + `collect_buffer` text, + `blocked_gate_id` text, + `updated_at` integer NOT NULL, + PRIMARY KEY(`session_id`, `thread_id`) +); +--> statement-breakpoint +CREATE TABLE `engine_sessions` ( + `id` text PRIMARY KEY NOT NULL, + `user_id` text NOT NULL, + `org_id` text NOT NULL, + `workspace` text NOT NULL, + `purpose` text NOT NULL, + `status` text NOT NULL, + `sandbox_id` text, + `snapshot_id` text, + `parent_session_id` text, + `metadata` text, + `created_at` integer NOT NULL, + `updated_at` integer NOT NULL +); +--> statement-breakpoint +CREATE INDEX `engine_sessions_user` ON `engine_sessions` (`user_id`);--> statement-breakpoint +CREATE INDEX `engine_sessions_status` ON `engine_sessions` (`status`);--> statement-breakpoint +CREATE TABLE `engine_suspended_turns` ( + `session_id` text NOT NULL, + `thread_id` text NOT NULL, + `queue_item_id` text NOT NULL, + `gate_id` text NOT NULL, + `model` text NOT NULL, + `leaf_entry_id` text, + `tool_call_id` text NOT NULL, + `tool_name` text NOT NULL, + `tool_args` text NOT NULL, + `resume_key` text NOT NULL, + `attempt` integer NOT NULL, + `created_at` integer NOT NULL, + PRIMARY KEY(`session_id`, `thread_id`) +); +--> statement-breakpoint +CREATE INDEX `engine_suspended_turns_gate` ON `engine_suspended_turns` (`gate_id`);--> statement-breakpoint +CREATE TABLE `engine_threads` ( + `id` text PRIMARY KEY NOT NULL, + `session_id` text NOT NULL, + `key` text NOT NULL, + `status` text NOT NULL, + `active_leaf_entry_id` text, + `queue_mode` text NOT NULL, + `model` text, + `summary` text, + `metadata` text, + `created_at` integer NOT NULL, + `updated_at` integer NOT NULL +); +--> statement-breakpoint +CREATE INDEX `engine_threads_session` ON `engine_threads` (`session_id`);--> statement-breakpoint +CREATE INDEX `engine_threads_session_key` ON `engine_threads` (`session_id`,`key`); \ No newline at end of file diff --git a/packages/engine/migrations/sqlite/meta/0000_snapshot.json b/packages/engine/migrations/sqlite/meta/0000_snapshot.json new file mode 100644 index 00000000..bbe4de98 --- /dev/null +++ b/packages/engine/migrations/sqlite/meta/0000_snapshot.json @@ -0,0 +1,905 @@ +{ + "version": "6", + "dialect": "sqlite", + "id": "e464cf58-a0fd-494a-9e66-70a9fe4c5fd5", + "prevId": "00000000-0000-0000-0000-000000000000", + "tables": { + "engine_decision_gate_refs": { + "name": "engine_decision_gate_refs", + "columns": { + "id": { + "name": "id", + "type": "text", + "primaryKey": true, + "notNull": true, + "autoincrement": false + }, + "gate_id": { + "name": "gate_id", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "channel_type": { + "name": "channel_type", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "ref": { + "name": "ref", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "created_at": { + "name": "created_at", + "type": "integer", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "updated_at": { + "name": "updated_at", + "type": "integer", + "primaryKey": false, + "notNull": true, + "autoincrement": false + } + }, + "indexes": { + "engine_decision_gate_refs_gate": { + "name": "engine_decision_gate_refs_gate", + "columns": [ + "gate_id" + ], + "isUnique": false + } + }, + "foreignKeys": {}, + "compositePrimaryKeys": {}, + "uniqueConstraints": {}, + "checkConstraints": {} + }, + "engine_decision_gates": { + "name": "engine_decision_gates", + "columns": { + "id": { + "name": "id", + "type": "text", + "primaryKey": true, + "notNull": true, + "autoincrement": false + }, + "session_id": { + "name": "session_id", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "thread_id": { + "name": "thread_id", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "type": { + "name": "type", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "status": { + "name": "status", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "title": { + "name": "title", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "body": { + "name": "body", + "type": "text", + "primaryKey": false, + "notNull": false, + "autoincrement": false + }, + "actions": { + "name": "actions", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "origin": { + "name": "origin", + "type": "text", + "primaryKey": false, + "notNull": false, + "autoincrement": false + }, + "context": { + "name": "context", + "type": "text", + "primaryKey": false, + "notNull": false, + "autoincrement": false + }, + "resolution": { + "name": "resolution", + "type": "text", + "primaryKey": false, + "notNull": false, + "autoincrement": false + }, + "expires_at": { + "name": "expires_at", + "type": "integer", + "primaryKey": false, + "notNull": false, + "autoincrement": false + }, + "created_at": { + "name": "created_at", + "type": "integer", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "updated_at": { + "name": "updated_at", + "type": "integer", + "primaryKey": false, + "notNull": true, + "autoincrement": false + } + }, + "indexes": { + "engine_decision_gates_thread": { + "name": "engine_decision_gates_thread", + "columns": [ + "session_id", + "thread_id", + "status" + ], + "isUnique": false + } + }, + "foreignKeys": {}, + "compositePrimaryKeys": {}, + "uniqueConstraints": {}, + "checkConstraints": {} + }, + "engine_entries": { + "name": "engine_entries", + "columns": { + "id": { + "name": "id", + "type": "text", + "primaryKey": true, + "notNull": true, + "autoincrement": false + }, + "session_id": { + "name": "session_id", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "thread_id": { + "name": "thread_id", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "parent_id": { + "name": "parent_id", + "type": "text", + "primaryKey": false, + "notNull": false, + "autoincrement": false + }, + "entry_type": { + "name": "entry_type", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "role": { + "name": "role", + "type": "text", + "primaryKey": false, + "notNull": false, + "autoincrement": false + }, + "content": { + "name": "content", + "type": "text", + "primaryKey": false, + "notNull": false, + "autoincrement": false + }, + "parts": { + "name": "parts", + "type": "text", + "primaryKey": false, + "notNull": false, + "autoincrement": false + }, + "author": { + "name": "author", + "type": "text", + "primaryKey": false, + "notNull": false, + "autoincrement": false + }, + "channel": { + "name": "channel", + "type": "text", + "primaryKey": false, + "notNull": false, + "autoincrement": false + }, + "model": { + "name": "model", + "type": "text", + "primaryKey": false, + "notNull": false, + "autoincrement": false + }, + "summary": { + "name": "summary", + "type": "text", + "primaryKey": false, + "notNull": false, + "autoincrement": false + }, + "covered_entry_ids": { + "name": "covered_entry_ids", + "type": "text", + "primaryKey": false, + "notNull": false, + "autoincrement": false + }, + "token_count_before": { + "name": "token_count_before", + "type": "integer", + "primaryKey": false, + "notNull": false, + "autoincrement": false + }, + "token_count_after": { + "name": "token_count_after", + "type": "integer", + "primaryKey": false, + "notNull": false, + "autoincrement": false + }, + "file_context": { + "name": "file_context", + "type": "text", + "primaryKey": false, + "notNull": false, + "autoincrement": false + }, + "branch_root_id": { + "name": "branch_root_id", + "type": "text", + "primaryKey": false, + "notNull": false, + "autoincrement": false + }, + "branch_leaf_id": { + "name": "branch_leaf_id", + "type": "text", + "primaryKey": false, + "notNull": false, + "autoincrement": false + }, + "gate_id": { + "name": "gate_id", + "type": "text", + "primaryKey": false, + "notNull": false, + "autoincrement": false + }, + "resolved_at": { + "name": "resolved_at", + "type": "text", + "primaryKey": false, + "notNull": false, + "autoincrement": false + }, + "resolution": { + "name": "resolution", + "type": "text", + "primaryKey": false, + "notNull": false, + "autoincrement": false + }, + "withdrawn_reason": { + "name": "withdrawn_reason", + "type": "text", + "primaryKey": false, + "notNull": false, + "autoincrement": false + }, + "metadata": { + "name": "metadata", + "type": "text", + "primaryKey": false, + "notNull": false, + "autoincrement": false + }, + "created_at": { + "name": "created_at", + "type": "integer", + "primaryKey": false, + "notNull": true, + "autoincrement": false + } + }, + "indexes": { + "engine_entries_thread": { + "name": "engine_entries_thread", + "columns": [ + "session_id", + "thread_id", + "created_at" + ], + "isUnique": false + }, + "engine_entries_gate": { + "name": "engine_entries_gate", + "columns": [ + "gate_id" + ], + "isUnique": false + } + }, + "foreignKeys": {}, + "compositePrimaryKeys": {}, + "uniqueConstraints": {}, + "checkConstraints": {} + }, + "engine_queue_items": { + "name": "engine_queue_items", + "columns": { + "id": { + "name": "id", + "type": "text", + "primaryKey": true, + "notNull": true, + "autoincrement": false + }, + "session_id": { + "name": "session_id", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "thread_id": { + "name": "thread_id", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "status": { + "name": "status", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "mode": { + "name": "mode", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "content": { + "name": "content", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "author": { + "name": "author", + "type": "text", + "primaryKey": false, + "notNull": false, + "autoincrement": false + }, + "channel": { + "name": "channel", + "type": "text", + "primaryKey": false, + "notNull": false, + "autoincrement": false + }, + "reply_target": { + "name": "reply_target", + "type": "text", + "primaryKey": false, + "notNull": false, + "autoincrement": false + }, + "model": { + "name": "model", + "type": "text", + "primaryKey": false, + "notNull": false, + "autoincrement": false + }, + "metadata": { + "name": "metadata", + "type": "text", + "primaryKey": false, + "notNull": false, + "autoincrement": false + }, + "created_at": { + "name": "created_at", + "type": "integer", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "updated_at": { + "name": "updated_at", + "type": "integer", + "primaryKey": false, + "notNull": true, + "autoincrement": false + } + }, + "indexes": { + "engine_queue_items_thread": { + "name": "engine_queue_items_thread", + "columns": [ + "session_id", + "thread_id", + "status" + ], + "isUnique": false + } + }, + "foreignKeys": {}, + "compositePrimaryKeys": {}, + "uniqueConstraints": {}, + "checkConstraints": {} + }, + "engine_queue_state": { + "name": "engine_queue_state", + "columns": { + "thread_id": { + "name": "thread_id", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "session_id": { + "name": "session_id", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "mode": { + "name": "mode", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "status": { + "name": "status", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "active_item_id": { + "name": "active_item_id", + "type": "text", + "primaryKey": false, + "notNull": false, + "autoincrement": false + }, + "pending": { + "name": "pending", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "collect_buffer": { + "name": "collect_buffer", + "type": "text", + "primaryKey": false, + "notNull": false, + "autoincrement": false + }, + "blocked_gate_id": { + "name": "blocked_gate_id", + "type": "text", + "primaryKey": false, + "notNull": false, + "autoincrement": false + }, + "updated_at": { + "name": "updated_at", + "type": "integer", + "primaryKey": false, + "notNull": true, + "autoincrement": false + } + }, + "indexes": {}, + "foreignKeys": {}, + "compositePrimaryKeys": { + "engine_queue_state_session_id_thread_id_pk": { + "columns": [ + "session_id", + "thread_id" + ], + "name": "engine_queue_state_session_id_thread_id_pk" + } + }, + "uniqueConstraints": {}, + "checkConstraints": {} + }, + "engine_sessions": { + "name": "engine_sessions", + "columns": { + "id": { + "name": "id", + "type": "text", + "primaryKey": true, + "notNull": true, + "autoincrement": false + }, + "user_id": { + "name": "user_id", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "org_id": { + "name": "org_id", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "workspace": { + "name": "workspace", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "purpose": { + "name": "purpose", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "status": { + "name": "status", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "sandbox_id": { + "name": "sandbox_id", + "type": "text", + "primaryKey": false, + "notNull": false, + "autoincrement": false + }, + "snapshot_id": { + "name": "snapshot_id", + "type": "text", + "primaryKey": false, + "notNull": false, + "autoincrement": false + }, + "parent_session_id": { + "name": "parent_session_id", + "type": "text", + "primaryKey": false, + "notNull": false, + "autoincrement": false + }, + "metadata": { + "name": "metadata", + "type": "text", + "primaryKey": false, + "notNull": false, + "autoincrement": false + }, + "created_at": { + "name": "created_at", + "type": "integer", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "updated_at": { + "name": "updated_at", + "type": "integer", + "primaryKey": false, + "notNull": true, + "autoincrement": false + } + }, + "indexes": { + "engine_sessions_user": { + "name": "engine_sessions_user", + "columns": [ + "user_id" + ], + "isUnique": false + }, + "engine_sessions_status": { + "name": "engine_sessions_status", + "columns": [ + "status" + ], + "isUnique": false + } + }, + "foreignKeys": {}, + "compositePrimaryKeys": {}, + "uniqueConstraints": {}, + "checkConstraints": {} + }, + "engine_suspended_turns": { + "name": "engine_suspended_turns", + "columns": { + "session_id": { + "name": "session_id", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "thread_id": { + "name": "thread_id", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "queue_item_id": { + "name": "queue_item_id", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "gate_id": { + "name": "gate_id", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "model": { + "name": "model", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "leaf_entry_id": { + "name": "leaf_entry_id", + "type": "text", + "primaryKey": false, + "notNull": false, + "autoincrement": false + }, + "tool_call_id": { + "name": "tool_call_id", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "tool_name": { + "name": "tool_name", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "tool_args": { + "name": "tool_args", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "resume_key": { + "name": "resume_key", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "attempt": { + "name": "attempt", + "type": "integer", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "created_at": { + "name": "created_at", + "type": "integer", + "primaryKey": false, + "notNull": true, + "autoincrement": false + } + }, + "indexes": { + "engine_suspended_turns_gate": { + "name": "engine_suspended_turns_gate", + "columns": [ + "gate_id" + ], + "isUnique": false + } + }, + "foreignKeys": {}, + "compositePrimaryKeys": { + "engine_suspended_turns_session_id_thread_id_pk": { + "columns": [ + "session_id", + "thread_id" + ], + "name": "engine_suspended_turns_session_id_thread_id_pk" + } + }, + "uniqueConstraints": {}, + "checkConstraints": {} + }, + "engine_threads": { + "name": "engine_threads", + "columns": { + "id": { + "name": "id", + "type": "text", + "primaryKey": true, + "notNull": true, + "autoincrement": false + }, + "session_id": { + "name": "session_id", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "key": { + "name": "key", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "status": { + "name": "status", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "active_leaf_entry_id": { + "name": "active_leaf_entry_id", + "type": "text", + "primaryKey": false, + "notNull": false, + "autoincrement": false + }, + "queue_mode": { + "name": "queue_mode", + "type": "text", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "model": { + "name": "model", + "type": "text", + "primaryKey": false, + "notNull": false, + "autoincrement": false + }, + "summary": { + "name": "summary", + "type": "text", + "primaryKey": false, + "notNull": false, + "autoincrement": false + }, + "metadata": { + "name": "metadata", + "type": "text", + "primaryKey": false, + "notNull": false, + "autoincrement": false + }, + "created_at": { + "name": "created_at", + "type": "integer", + "primaryKey": false, + "notNull": true, + "autoincrement": false + }, + "updated_at": { + "name": "updated_at", + "type": "integer", + "primaryKey": false, + "notNull": true, + "autoincrement": false + } + }, + "indexes": { + "engine_threads_session": { + "name": "engine_threads_session", + "columns": [ + "session_id" + ], + "isUnique": false + }, + "engine_threads_session_key": { + "name": "engine_threads_session_key", + "columns": [ + "session_id", + "key" + ], + "isUnique": false + } + }, + "foreignKeys": {}, + "compositePrimaryKeys": {}, + "uniqueConstraints": {}, + "checkConstraints": {} + } + }, + "views": {}, + "enums": {}, + "_meta": { + "schemas": {}, + "tables": {}, + "columns": {} + }, + "internal": { + "indexes": {} + } +} \ No newline at end of file diff --git a/packages/engine/migrations/sqlite/meta/_journal.json b/packages/engine/migrations/sqlite/meta/_journal.json new file mode 100644 index 00000000..5a4d72af --- /dev/null +++ b/packages/engine/migrations/sqlite/meta/_journal.json @@ -0,0 +1,13 @@ +{ + "version": "7", + "dialect": "sqlite", + "entries": [ + { + "idx": 0, + "version": "6", + "when": 1778041793822, + "tag": "0000_lonely_lizard", + "breakpoints": true + } + ] +} \ No newline at end of file diff --git a/packages/engine/package.json b/packages/engine/package.json new file mode 100644 index 00000000..a1f00482 --- /dev/null +++ b/packages/engine/package.json @@ -0,0 +1,39 @@ +{ + "name": "@valet/engine", + "version": "0.0.1", + "private": true, + "type": "module", + "main": "./dist/index.js", + "types": "./dist/index.d.ts", + "exports": { + ".": { + "types": "./dist/index.d.ts", + "import": "./dist/index.js" + } + }, + "scripts": { + "build": "tsc", + "typecheck": "tsc --noEmit", + "test": "vitest run", + "db:generate": "drizzle-kit generate", + "repl": "tsx bin/repl.ts" + }, + "dependencies": { + "@mariozechner/pi-agent-core": "0.73.0", + "@mariozechner/pi-ai": "0.73.0", + "drizzle-orm": "^0.45.1", + "typebox": "^1.1.24", + "zod": "^3.22.4", + "zod-to-json-schema": "^3.24.6" + }, + "devDependencies": { + "@types/better-sqlite3": "^7.6.8", + "@types/node": "^20.0.0", + "@valet/plugin-github": "workspace:*", + "better-sqlite3": "^11.0.0", + "drizzle-kit": "^0.31.9", + "tsx": "^4.0.0", + "typescript": "^5.3.3", + "vitest": "^4.0.18" + } +} diff --git a/packages/engine/src/action-bridge.ts b/packages/engine/src/action-bridge.ts new file mode 100644 index 00000000..5fec367b --- /dev/null +++ b/packages/engine/src/action-bridge.ts @@ -0,0 +1,410 @@ +import { Type } from "typebox"; +import type { TSchema } from "typebox"; +import type { z } from "zod"; +import { zodToJsonSchema } from "zod-to-json-schema"; +import type { RiskLevel, ToolAttachment, ToolContext, ToolDef, ToolResult } from "./types.js"; + +/** + * Action bridge: V1 migration adapter for existing valet plugin packages. + * + * Design (per spec §"Plugin Action Bridge"): we register exactly two + * engine-visible tools — `list_tools` and `call_tool` — and let the agent + * discover plugin actions on demand. We do NOT register one LLM-visible + * tool per action because (a) action ids contain dots and Anthropic rejects + * them in tool names, (b) dozens of plugins × dozens of actions blows the + * tool-catalog budget, and (c) most sessions only need a handful of actions + * and shouldn't pay the prompt cost of all of them. + */ + +// ── Plugin shapes (structurally compatible with @valet/sdk) ─────── + +export interface BridgeActionDefinition { + id: string; + name: string; + description: string; + riskLevel: RiskLevel; + params: TParams; + /** Raw JSON Schema — when present, bypasses Zod conversion. */ + inputSchema?: Record; +} + +export interface BridgeActionListContext { + credentials?: Record; +} + +export interface BridgeActionContext { + credentials: Record; + userId: string; + orgId?: string; + callerIdentity?: { name: string; avatar?: string }; + attribution?: { name: string; email: string }; + guardConfig?: Record; + // analytics is intentionally omitted — the engine emits its own observability. + analytics?: unknown; +} + +export interface BridgeActionResult { + success: boolean; + data?: T; + error?: string; + images?: Array<{ data: string; mimeType: string; description: string }>; +} + +export interface BridgeActionSource { + listActions( + ctx?: BridgeActionListContext, + ): BridgeActionDefinition[] | Promise; + execute( + actionId: string, + params: unknown, + ctx: BridgeActionContext, + ): Promise; +} + +export type ApprovalMode = "allow" | "require_approval" | "deny"; + +export interface ActionSourceConfig { + /** Service id (e.g. "github"). Used as the credential service name. */ + service: string; + /** The plugin's ActionSource. */ + actions: BridgeActionSource; + /** Override credential service (defaults to `service`). */ + credentialService?: string; + /** + * Default approval policy. Unset = derived from riskLevel: + * low/medium → allow; high/critical → require_approval. + */ + defaultApprovalMode?: ApprovalMode; +} + +export interface ActionBridgeOptions { + sources: ActionSourceConfig[]; +} + +// ── Public API ──────────────────────────────────────────────────── + +/** + * Build the [list_tools, call_tool] pair backed by an in-memory catalog + * assembled from every ActionSource in `opts.sources`. The catalog is + * resolved at construction time; if plugins can register dynamically later, + * we'll need a refresh hook. + */ +export async function actionBridgeTools( + opts: ActionBridgeOptions, +): Promise { + const catalog = await buildCatalog(opts.sources); + return [makeListTool(catalog), makeCallTool(catalog)]; +} + +// ── Catalog ─────────────────────────────────────────────────────── + +interface CatalogEntry { + service: string; + config: ActionSourceConfig; + def: BridgeActionDefinition; + parameters: Record; +} + +interface Catalog { + entries: CatalogEntry[]; + byId: Map; +} + +async function buildCatalog(sources: ActionSourceConfig[]): Promise { + const entries: CatalogEntry[] = []; + const byId = new Map(); + for (const config of sources) { + const defs = await config.actions.listActions(); + for (const def of defs) { + const entry: CatalogEntry = { + service: config.service, + config, + def, + parameters: resolveParameters(def), + }; + entries.push(entry); + // Action ids are commonly already qualified (e.g. "github.create_issue"). + // If the plugin emits a bare id, qualify it. + const fqid = def.id.includes(".") ? def.id : `${config.service}.${def.id}`; + byId.set(fqid, entry); + // Register both forms so tool_id="bare_id" works too (when unambiguous). + if (def.id !== fqid && !byId.has(def.id)) byId.set(def.id, entry); + } + } + return { entries, byId }; +} + +function resolveParameters(def: BridgeActionDefinition): Record { + if (def.inputSchema) return def.inputSchema; + const json = zodToJsonSchema(def.params, { target: "jsonSchema7" }); + if (typeof json === "object" && json !== null) { + const obj = json as Record; + delete obj.$schema; + delete obj.$ref; + delete obj.definitions; + return obj; + } + return { type: "object", properties: {}, required: [] }; +} + +// ── list_tools ─────────────────────────────────────────────────── + +const LIST_LIMIT_DEFAULT = 50; +const LIST_LIMIT_MAX = 200; + +function makeListTool(catalog: Catalog): ToolDef { + return { + name: "list_tools", + description: + "List available plugin tools. Filter by service or search by name/description. Returns tool_ids plus their parameter schemas; use call_tool to invoke one.", + parameters: Type.Object({ + service: Type.Optional( + Type.String({ + description: + "Filter by service name (e.g. 'github', 'gmail'). Omit to list across all services.", + }), + ), + query: Type.Optional( + Type.String({ + description: "Case-insensitive substring match against name, id, and description.", + }), + ), + limit: Type.Optional( + Type.Integer({ + minimum: 1, + maximum: LIST_LIMIT_MAX, + description: `Cap results (default ${LIST_LIMIT_DEFAULT}, max ${LIST_LIMIT_MAX}).`, + }), + ), + }), + execute: async (args, ctx): Promise => { + const a = args as { service?: string; query?: string; limit?: number }; + const limit = clamp(a.limit ?? LIST_LIMIT_DEFAULT, 1, LIST_LIMIT_MAX); + const q = a.query?.toLowerCase(); + + let entries = catalog.entries; + if (a.service) entries = entries.filter((e) => e.service === a.service); + if (q) { + entries = entries.filter((e) => { + const def = e.def; + return ( + def.id.toLowerCase().includes(q) || + def.name.toLowerCase().includes(q) || + def.description.toLowerCase().includes(q) + ); + }); + } + + // Per-service auth warnings: probe each represented service's + // credentials and report missing ones so the LLM can ask the user to + // reauthorize. + const services = new Set(entries.map((e) => e.service)); + const warnings: Array<{ service: string; reason: string }> = []; + for (const service of services) { + const credService = + catalog.entries.find((e) => e.service === service)?.config.credentialService ?? service; + const cred = await ctx.credentials.get(credService); + if (!cred) warnings.push({ service, reason: "no credential connected" }); + } + + const tools = entries.slice(0, limit).map((e) => ({ + service: e.service, + tool_id: qualifiedId(e), + name: e.def.name, + description: e.def.description, + riskLevel: e.def.riskLevel, + params: e.parameters, + })); + + const total = entries.length; + const text = JSON.stringify( + { + tools, + total, + truncated: total > limit ? total - limit : undefined, + warnings: warnings.length > 0 ? warnings : undefined, + }, + null, + 2, + ); + return { text }; + }, + }; +} + +// ── call_tool ──────────────────────────────────────────────────── + +function makeCallTool(catalog: Catalog): ToolDef { + return { + name: "call_tool", + description: + "Invoke a plugin action by tool_id (discovered via list_tools). Approval gates may suspend execution for high/critical risk actions.", + parameters: Type.Object({ + tool_id: Type.String({ + description: "Fully-qualified action id from list_tools (e.g. 'github.create_issue').", + }), + // The LLM passes params as an object; we accept any JSON shape. + params: Type.Optional( + Type.Record(Type.String(), Type.Any(), { + description: + "Action parameters, matching the schema reported by list_tools for this tool_id.", + }), + ), + summary: Type.String({ + description: + "One-line human-readable summary of what this call does. Shown in approval gates and audit logs.", + }), + }), + execute: async (args, ctx): Promise => { + const a = args as { tool_id: string; params?: Record; summary: string }; + const entry = catalog.byId.get(a.tool_id); + if (!entry) { + return { + text: `unknown tool_id: "${a.tool_id}". Use list_tools to find available actions.`, + }; + } + + const approvalMode = approvalModeFor(entry); + if (approvalMode === "deny") { + return { text: `denied: ${a.tool_id} is blocked by org policy` }; + } + if (approvalMode === "require_approval") { + const resolution = await ctx.requestDecision({ + type: "approval", + title: `Approve ${entry.def.name}?`, + body: `${a.summary}\n\ntool_id=${a.tool_id}\nargs=${stableJson(a.params ?? {})}`, + resumeKey: `${qualifiedId(entry)}:${stableJson(a.params ?? {})}`, + context: { + riskLevel: entry.def.riskLevel, + service: entry.service, + tool_id: a.tool_id, + args: a.params, + }, + }); + if (resolution.actionId !== "approve") { + return { text: `denied: user did not approve ${a.tool_id}` }; + } + } + + const credentialService = entry.config.credentialService ?? entry.service; + const credentials = await resolveCredentials(ctx, credentialService); + const actionCtx: BridgeActionContext = { + credentials, + userId: ctx.userId, + orgId: ctx.orgId, + callerIdentity: ctx.actor + ? { name: ctx.actor.name ?? ctx.actor.id } + : undefined, + attribution: ctx.actor?.email + ? { name: ctx.actor.name ?? ctx.actor.id, email: ctx.actor.email } + : undefined, + }; + + let result: BridgeActionResult; + try { + result = await entry.config.actions.execute(entry.def.id, a.params ?? {}, actionCtx); + } catch (err) { + return { + text: `error: ${err instanceof Error ? err.message : String(err)}`, + }; + } + + return actionResultToToolResult(result, a.tool_id); + }, + }; +} + +// ── helpers ────────────────────────────────────────────────────── + +function approvalModeFor(entry: CatalogEntry): ApprovalMode { + if (entry.config.defaultApprovalMode) return entry.config.defaultApprovalMode; + switch (entry.def.riskLevel) { + case "low": + case "medium": + return "allow"; + case "high": + case "critical": + return "require_approval"; + } +} + +function qualifiedId(entry: CatalogEntry): string { + return entry.def.id.includes(".") ? entry.def.id : `${entry.service}.${entry.def.id}`; +} + +async function resolveCredentials( + ctx: ToolContext, + service: string, +): Promise> { + const cred = await ctx.credentials.get(service); + if (!cred) return {}; + // Plugins read various keys: access_token, token, api_key. Map our typed + // Credential into a flat string map matching the legacy IntegrationCredentials shape. + const creds: Record = {}; + if (cred.accessToken) { + creds.access_token = cred.accessToken; + creds.token = cred.accessToken; + } + if (cred.refreshToken) creds.refresh_token = cred.refreshToken; + if (cred.metadata) { + for (const [k, v] of Object.entries(cred.metadata)) { + if (typeof v === "string") creds[k] = v; + } + } + return creds; +} + +function actionResultToToolResult( + result: BridgeActionResult, + toolId: string, +): ToolResult { + const attachments: ToolAttachment[] = []; + if (result.images) { + for (const img of result.images) { + attachments.push({ + type: "image", + data: base64ToBytes(img.data), + mimeType: img.mimeType, + name: img.description, + }); + } + } + + if (!result.success) { + return { + text: `${toolId} failed: ${result.error ?? "unknown error"}`, + attachments: attachments.length > 0 ? attachments : undefined, + }; + } + if (result.data === undefined) { + return { + text: `${toolId} ok`, + attachments: attachments.length > 0 ? attachments : undefined, + }; + } + return { + text: stableJson(result.data), + attachments: attachments.length > 0 ? attachments : undefined, + }; +} + +function clamp(n: number, min: number, max: number): number { + return Math.max(min, Math.min(max, n)); +} + +function stableJson(value: unknown): string { + try { + return JSON.stringify(value, null, 2); + } catch { + return String(value); + } +} + +function base64ToBytes(b64: string): Uint8Array { + const clean = b64.startsWith("data:") ? b64.slice(b64.indexOf(",") + 1) : b64; + const binary = (globalThis as { atob: (s: string) => string }).atob(clean); + const out = new Uint8Array(binary.length); + for (let i = 0; i < binary.length; i++) out[i] = binary.charCodeAt(i); + return out; +} + diff --git a/packages/engine/src/builtin-tools/index.ts b/packages/engine/src/builtin-tools/index.ts new file mode 100644 index 00000000..b901c424 --- /dev/null +++ b/packages/engine/src/builtin-tools/index.ts @@ -0,0 +1,108 @@ +import { Type } from "typebox"; +import type { TSchema } from "typebox"; +import type { ToolDef, MessageQuery } from "../types.js"; + +/** + * Helper that preserves the schema's static type through the ToolDef so + * `args` in `execute` is typed precisely instead of `unknown`. + */ +export function defineTool(def: ToolDef): ToolDef { + return def; +} + +export const readTool = defineTool({ + name: "read", + description: "Read the contents of a file from the sandbox.", + parameters: Type.Object({ path: Type.String() }), + execute: async (args, ctx) => { + const text = await ctx.sandbox.readFile(args.path); + return { text }; + }, +}); + +export const writeTool = defineTool({ + name: "write", + description: "Write contents to a file in the sandbox (creates or overwrites).", + parameters: Type.Object({ path: Type.String(), content: Type.String() }), + execute: async (args, ctx) => { + await ctx.sandbox.writeFile(args.path, args.content); + return { text: `wrote ${args.path}` }; + }, +}); + +export const editTool = defineTool({ + name: "edit", + description: "Replace exact text occurrences in a file.", + parameters: Type.Object({ + path: Type.String(), + oldString: Type.String(), + newString: Type.String(), + }), + execute: async (args, ctx) => { + const before = await ctx.sandbox.readFile(args.path); + if (!before.includes(args.oldString)) { + return { text: `no match for old_string in ${args.path}` }; + } + const after = before.split(args.oldString).join(args.newString); + await ctx.sandbox.writeFile(args.path, after); + return { text: `edited ${args.path}` }; + }, +}); + +export const bashTool = defineTool({ + name: "bash", + description: "Execute a shell command in the sandbox.", + parameters: Type.Object({ command: Type.String() }), + execute: async (args, ctx) => { + const result = await ctx.sandbox.exec(args.command, { signal: ctx.signal }); + const exitNote = result.exitCode === 0 ? "" : `\n[exit ${result.exitCode}]`; + return { text: `${result.stdout}${result.stderr}${exitNote}` }; + }, +}); + +export const threadReadTool = defineTool({ + name: "thread_read", + description: + "Read recent messages from another thread in this session. Useful for cross-thread context (e.g. an orchestrator pulling notes from a worker thread, or a thread checking what a sibling has done).", + parameters: Type.Object({ + key: Type.String({ description: "Thread key to read from (e.g. 'web:default', 'task:research')." }), + limit: Type.Optional(Type.Integer({ minimum: 1, maximum: 200 })), + includeCompacted: Type.Optional(Type.Boolean()), + }), + execute: async (args, ctx) => { + const opts: MessageQuery = { + limit: args.limit ?? 30, + includeCompacted: args.includeCompacted ?? true, + }; + const entries = await ctx.threadRead(args.key, opts); + if (entries.length === 0) return { text: `(thread "${args.key}" has no messages)` }; + const lines: string[] = [`# thread:${args.key}`]; + for (const e of entries) { + if (e.type === "message") { + const author = e.author?.name ? ` (${e.author.name})` : ""; + lines.push(`\n## ${e.role}${author} @ ${new Date(e.createdAt).toISOString()}`); + lines.push(e.content); + } else if (e.type === "compaction") { + lines.push(`\n## [compaction summary]`); + lines.push(e.summary); + } else if (e.type === "decision_gate") { + lines.push( + `\n## [decision gate: ${e.gate.type} — ${e.gate.status}] ${e.gate.title}`, + ); + if (e.gate.body) lines.push(e.gate.body); + } else if (e.type === "branch_summary") { + lines.push(`\n## [branch summary]`); + lines.push(e.summary); + } + } + return { text: lines.join("\n") }; + }, +}); + +export const builtinTools: ToolDef[] = [ + readTool, + writeTool, + editTool, + bashTool, + threadReadTool, +]; diff --git a/packages/engine/src/compaction.ts b/packages/engine/src/compaction.ts new file mode 100644 index 00000000..6153dfb7 --- /dev/null +++ b/packages/engine/src/compaction.ts @@ -0,0 +1,551 @@ +import { completeSimple } from "@mariozechner/pi-ai"; +import type { Message, Model } from "@mariozechner/pi-ai"; +import type { + CompactionConfig, + MessageEntry, + SessionEntry, +} from "./types.js"; + +/** + * Compaction primitives — all pure functions. Orchestration that calls an + * LLM and persists results lives in the orchestrator (see compactThread in + * thread.ts). Keeping these pure makes them trivially unit-testable + * against synthetic transcripts. + */ + +// ── Constants and defaults ───────────────────────────────────────── + +const DEFAULTS = { + reserveCap: 20_000, + tailTurns: 2, + minPreserveRecentTokens: 2_000, + maxPreserveRecentTokens: 8_000, + pruneProtectTokens: 40_000, + pruneMinimumTokens: 20_000, + toolOutputMaxChars: 2_000, +} as const; + +const DEFAULT_PROTECTED_TOOLS = new Set(["skill", "thread_read"]); + +// ── Token estimation ─────────────────────────────────────────────── + +/** + * Crude byte-based token estimate. We estimate ~4 chars per token, which + * matches the heuristic OpenCode and pi-ai both use for budgeting decisions. + * Provider-reported token counts (from pi-ai usage) are used where available; + * this estimator is for offline budgeting (cut-point selection, prune budget). + */ +export function estimateTokens(text: string): number { + return Math.ceil(text.length / 4); +} + +export function estimateEntryTokens(entry: SessionEntry): number { + if (entry.type === "message") { + let total = estimateTokens(entry.content); + for (const part of entry.parts ?? []) { + if (part.type === "text") total += estimateTokens(part.text); + else if (part.type === "thinking") total += estimateTokens(part.text); + else if (part.type === "tool_call") { + if (part.args) total += estimateTokens(JSON.stringify(part.args)); + if (part.result !== undefined && !part.elided) { + total += estimateTokens(typeof part.result === "string" ? part.result : JSON.stringify(part.result)); + } + if (part.error) total += estimateTokens(part.error); + } + } + return total; + } + if (entry.type === "compaction") return estimateTokens(entry.summary); + if (entry.type === "branch_summary") return estimateTokens(entry.summary); + return 0; // decision_gate adds negligible context tokens +} + +export function estimateTotalTokens(entries: readonly SessionEntry[]): number { + let total = 0; + for (const e of entries) total += estimateEntryTokens(e); + return total; +} + +// ── Usable budget ────────────────────────────────────────────────── + +export function usableTokens(model: Model, cfg?: CompactionConfig): number { + const context = model.contextWindow ?? 0; + if (context === 0) return 0; + const reserve = + cfg?.reserveTokens ?? Math.min(DEFAULTS.reserveCap, model.maxTokens ?? DEFAULTS.reserveCap); + return Math.max(0, context - reserve); +} + +export function tailBudget(usable: number, cfg?: CompactionConfig): number { + const min = cfg?.minPreserveRecentTokens ?? DEFAULTS.minPreserveRecentTokens; + const max = cfg?.maxPreserveRecentTokens ?? DEFAULTS.maxPreserveRecentTokens; + const target = Math.floor(usable * 0.25); + return clamp(target, min, max); +} + +function clamp(n: number, min: number, max: number): number { + return Math.max(min, Math.min(max, n)); +} + +// ── Turn segmentation ────────────────────────────────────────────── + +export interface Turn { + /** Index into the entries array of the user message that starts this turn. */ + start: number; + /** Index of the next user-message turn boundary, or entries.length if last. */ + end: number; + /** Entry id of the user message at `start`. */ + id: string; +} + +/** + * Segment a list of entries into turns. A turn = [user message, ...everything until next user message). + * Decision gates and compaction entries that fall mid-turn stay in their owning turn. + * Existing CompactionEntry markers are NOT turn boundaries — they sit at the head as a + * single virtual prefix. The first turn starts at the first user message after any + * leading non-message entries (which usually means index 0). + */ +export function turns(entries: readonly SessionEntry[]): Turn[] { + const result: Turn[] = []; + for (let i = 0; i < entries.length; i++) { + const e = entries[i]; + if (e.type !== "message" || e.role !== "user") continue; + result.push({ start: i, end: entries.length, id: e.id }); + } + for (let i = 0; i < result.length - 1; i++) { + result[i].end = result[i + 1].start; + } + return result; +} + +// ── Cut-point selection ──────────────────────────────────────────── + +export interface CutPoint { + /** Entries before this index go into the head (to be compacted). */ + cutIndex: number; + /** Entry id where the tail starts, or undefined if the tail is empty. */ + tailStartId: string | undefined; + /** True if we couldn't fit a full tail-turn budget; we kept what we could. */ + fallbackToFloor: boolean; +} + +export interface SelectCutPointOptions { + entries: readonly SessionEntry[]; + model: Model; + cfg?: CompactionConfig; + /** Override token estimator for tests. Defaults to estimateEntryTokens. */ + tokenize?: (entry: SessionEntry) => number; +} + +/** + * Pick a cut point so the tail (kept verbatim) fits within the tail budget + * derived from the model's usable context. Mirrors OpenCode's select(): + * + * - Compute tail budget from `usable * 0.25` clamped to [min, max]. + * - Take the last `tailTurns` turns and walk them oldest → newest from the end, + * accumulating size. Keep adding whole turns until the next one would + * overflow. If the very next (older) turn alone is too large to fit, split + * it: scan inside the turn for an entry whose suffix slice fits the + * remaining budget, and cut there. + * - If no tail can be preserved (e.g. the very last turn alone exceeds the + * budget and can't be split), keep the last turn anyway with + * fallbackToFloor=true so the orchestrator can decide to abort or proceed. + */ +export function selectCutPoint(opts: SelectCutPointOptions): CutPoint { + const { entries, model, cfg } = opts; + const tokenize = opts.tokenize ?? estimateEntryTokens; + const tailTurnsLimit = cfg?.tailTurns ?? DEFAULTS.tailTurns; + if (entries.length === 0 || tailTurnsLimit <= 0) { + return { cutIndex: entries.length, tailStartId: undefined, fallbackToFloor: false }; + } + + const usable = usableTokens(model, cfg); + const budget = tailBudget(usable, cfg); + const allTurns = turns(entries); + if (allTurns.length === 0) { + return { cutIndex: entries.length, tailStartId: undefined, fallbackToFloor: false }; + } + + // Take the last tailTurnsLimit turns as candidates for the tail. + const candidates = allTurns.slice(-tailTurnsLimit); + + // Walk newest → oldest, accumulating whole turns until we can't fit one. + let used = 0; + let keepStart = -1; + let keepStartId: string | undefined; + for (let i = candidates.length - 1; i >= 0; i--) { + const turn = candidates[i]; + const size = sumRange(entries, turn.start, turn.end, tokenize); + if (used + size <= budget) { + used += size; + keepStart = turn.start; + keepStartId = turn.id; + continue; + } + // This older turn can't fit whole — try to split it. + const remaining = budget - used; + const split = splitTurnForBudget({ + entries, + turn, + budget: remaining, + tokenize, + }); + if (split !== undefined) { + keepStart = split.cutIndex; + keepStartId = split.startId; + } + break; + } + + if (keepStart < 0) { + // Couldn't fit even the last turn within the tail budget. Floor: keep + // the most recent turn anyway so we always make progress. + const last = candidates[candidates.length - 1]; + return { + cutIndex: last.start, + tailStartId: last.id, + fallbackToFloor: true, + }; + } + + return { cutIndex: keepStart, tailStartId: keepStartId, fallbackToFloor: false }; +} + +interface TurnSplit { + cutIndex: number; + startId: string; +} + +function splitTurnForBudget(args: { + entries: readonly SessionEntry[]; + turn: Turn; + budget: number; + tokenize: (entry: SessionEntry) => number; +}): TurnSplit | undefined { + if (args.budget <= 0) return undefined; + if (args.turn.end - args.turn.start <= 1) return undefined; + // Try later and later split points until the suffix fits. + for (let start = args.turn.start + 1; start < args.turn.end; start++) { + const size = sumRange(args.entries, start, args.turn.end, args.tokenize); + if (size <= args.budget) { + const id = args.entries[start]?.id; + if (!id) return undefined; + return { cutIndex: start, startId: id }; + } + } + return undefined; +} + +function sumRange( + entries: readonly SessionEntry[], + start: number, + end: number, + tokenize: (entry: SessionEntry) => number, +): number { + let total = 0; + for (let i = start; i < end; i++) total += tokenize(entries[i]); + return total; +} + +// ── Pruning (cheap, no LLM) ──────────────────────────────────────── + +export interface PruneOptions { + entries: readonly SessionEntry[]; + cfg?: CompactionConfig; + /** Tool names exempt from pruning. Merged with cfg.protectedTools and ToolDef.protectedFromPruning. */ + protectedTools?: Set; +} + +export interface PruneResult { + /** entryId → list of tool_call callIds to mark elided (only filled if savedTokens >= pruneMinimumTokens). */ + toElide: Map; + savedTokens: number; + /** True if we'll commit (savedTokens >= pruneMinimumTokens). */ + willCommit: boolean; +} + +/** + * Walk entries newest → oldest. Track cumulative tool-output token estimate. + * Once the cumulative count exceeds `pruneProtectTokens`, mark every older + * tool-call result as elidable. Skip protected tools and tool calls that + * already have `elided: true`. + */ +export function planPrune(opts: PruneOptions): PruneResult { + const cfg = opts.cfg; + const protectTokens = cfg?.pruneProtectTokens ?? DEFAULTS.pruneProtectTokens; + const minimumTokens = cfg?.pruneMinimumTokens ?? DEFAULTS.pruneMinimumTokens; + const protectedTools = mergeProtectedTools(opts.protectedTools, cfg?.protectedTools); + + const toElide = new Map(); + let cumulative = 0; + let savedTokens = 0; + + for (let i = opts.entries.length - 1; i >= 0; i--) { + const entry = opts.entries[i]; + if (entry.type !== "message") continue; + if (!entry.parts) continue; + for (const part of entry.parts) { + if (part.type !== "tool_call") continue; + if (part.status !== "completed") continue; + if (part.elided) continue; + if (protectedTools.has(part.toolName)) continue; + const resultText = + part.result === undefined + ? "" + : typeof part.result === "string" + ? part.result + : JSON.stringify(part.result); + const size = estimateTokens(resultText); + cumulative += size; + if (cumulative <= protectTokens) continue; + // Past the protection window — mark this tool result for elision. + const list = toElide.get(entry.id) ?? []; + list.push(part.callId); + toElide.set(entry.id, list); + savedTokens += size; + } + } + + return { + toElide, + savedTokens, + willCommit: savedTokens >= minimumTokens, + }; +} + +function mergeProtectedTools( + base: Set | undefined, + fromCfg: string[] | undefined, +): Set { + const out = new Set(DEFAULT_PROTECTED_TOOLS); + if (base) for (const t of base) out.add(t); + if (fromCfg) for (const t of fromCfg) out.add(t); + return out; +} + +/** + * Apply a PruneResult to the entries by mutating the matching tool_call parts. + * The caller is responsible for persisting the mutation back to the SessionStore. + */ +export function applyPrune(entries: SessionEntry[], plan: PruneResult): void { + if (!plan.willCommit) return; + for (const entry of entries) { + if (entry.type !== "message") continue; + const elideIds = plan.toElide.get(entry.id); + if (!elideIds || elideIds.length === 0) continue; + const idSet = new Set(elideIds); + for (const part of entry.parts ?? []) { + if (part.type !== "tool_call") continue; + if (!idSet.has(part.callId)) continue; + part.elided = true; + part.result = { elided: true, reason: "pruned" }; + } + } +} + +// ── File context extraction ──────────────────────────────────────── + +const READ_TOOLS = new Set(["read", "grep", "glob"]); +const WRITE_TOOLS = new Set(["write", "edit"]); + +/** + * Walk the head entries' tool calls and pull out file paths, classifying + * each as `read` (tool was a reader) or `modified` (tool was a writer). + */ +export function extractFileContext( + entries: readonly SessionEntry[], +): { read: string[]; modified: string[] } { + const read = new Set(); + const modified = new Set(); + for (const entry of entries) { + if (entry.type !== "message") continue; + for (const part of entry.parts ?? []) { + if (part.type !== "tool_call") continue; + const path = extractPath(part.args); + if (!path) continue; + if (READ_TOOLS.has(part.toolName)) read.add(path); + else if (WRITE_TOOLS.has(part.toolName)) modified.add(path); + } + } + return { read: [...read], modified: [...modified] }; +} + +function extractPath(args: unknown): string | undefined { + if (!args || typeof args !== "object") return undefined; + const obj = args as Record; + const candidate = obj.path ?? obj.file ?? obj.filename ?? obj.target; + return typeof candidate === "string" ? candidate : undefined; +} + +// ── Summarizer ───────────────────────────────────────────────────── + +/** + * The required structured-markdown template. The engine relies on this + * shape downstream (e.g. for displaying a session-resume note) — keep + * sections in this exact order and casing. OpenCode pioneered this layout; + * we copy it verbatim because it works. + */ +const SUMMARY_TEMPLATE = `Output exactly the Markdown structure shown inside