Document portable runtime V1 spec by yourbuddyconner · Pull Request #46 · yourbuddyconner/valet

yourbuddyconner · 2026-05-06T03:33:01Z

Summary

Adds the portable runtime engine V1 specification.
Documents the required V1 contracts across engine API, data model, decision gates, providers, sandbox RPC, channel transports, API routes, client events, schema, adapter hosting, and observability.
Standardizes the spec on DecisionGate terminology for approvals, questions, and credential requests.

Verification

rg -n "InteractivePrompt|interactive_prompt|interactive_prompts|interactive prompt" docs/specs/2026-05-02-portable-runtime-engine-design.md
rg -n "TODO|TBD|FIXME" docs/specs/2026-05-02-portable-runtime-engine-design.md

Adds packages/engine — a portable agent runtime library that implements the V1 design from docs/specs/2026-05-02-portable-runtime-engine-design.md. Built on @mariozechner/pi-ai + @mariozechner/pi-agent-core. The engine itself has zero platform dependencies; platform adapters (Cloudflare, K8s) host it. What's implemented: - Engine public API (createSession, prompt, resolveDecision, abort, pause/resume) + Session/Thread classes. - Per-thread queue with three modes: followup (FIFO), steer (abort + start), collect (buffered window). - Decision gates with full lifecycle: pending -> resolved/withdrawn/ expired. Steer withdraws pending gates with reason=steer; abort with reason=abort. DecisionGateEntry persists in the DAG. - Multi-thread sessions: threads run concurrently with isolated histories; aborting one doesn't affect siblings. - Built-in thread_read tool for cross-thread visibility. - In-memory providers + VirtualSandbox so the full engine runs in vitest with no containers. 14 tests (happy path, decision gates, queue modes, multi-thread + thread_read), all green in <2s. Spec-vs-reality deltas reconciled by the implementation are documented in packages/engine/README.md (tool signature wrapping, no native suspension primitive, message_start vs message_update). Deferred for follow-up: restart-safe re-entrant gates (the SuspendedTurnState record is written but Engine.restoreSession is a stub), compaction, role/skill loading, model failover, ActionSource bridge, structured-result extraction.

Spec updates (docs/specs/2026-05-02-portable-runtime-engine-design.md): - Engine.restoreSession now takes { sessionId, options } so the host re-supplies tools/sandbox/model — the engine doesn't persist creation options across restarts. - Decision-gate IDs are explicitly derived as gate:{sessionId}:{threadId}:{queueItemId}:{resumeKey}, and resumeKey is required (not optional) on DecisionGateRequest. - Restart-safe contract section now explains what "re-entrant up to the decision point" means in practice (work before requestDecision runs twice; work after runs once on replay), how the engine populates ctx.suspendedDecision on restoration, and what events the engine must emit during replay. - New "LLM-faithful entry persistence" rule: assistant tool-call blocks must persist in MessageEntry.parts so the rehydrated transcript can be sent to LLM providers without producing a malformed [user, assistant(text-only), toolResult] sequence. - ToolContext.requestDecision typed as DecisionGateRequest (not DecisionGate); ctx.suspendedDecision documented as engine-only. - SuspendedTurnState bullets updated to reference the deterministic formula and the resumeKey explicitly. - Adapter Host Contract calls out engine.restoreSession({ sessionId, options }) and the queue-item / suspended-turn fields that must survive hibernation. Plan (docs/plans/2026-05-05-persistent-store-restart-safe-gates.md): - 16 tasks across 4 phases (schema, store + contract tests, restart-safe primitives, restoreSession + replay), all aligned with the updated spec. - Task 11 reworked from a flaky integration test that raced the agent loop into a deterministic unit test of a pure shouldShortCircuit predicate. - Task 12's rehydrateTranscript explicitly reconstructs assistant ToolCall blocks from MessageEntry.parts.

…ipts

Task 15 surfaced two real bugs in the prototype: - Session.rehydrate's resumeBlockedThreadIfReady call was fire-and-forget, racing with resolveDecision callers; awaiting it ensures the gate is re-armed before any caller can resolve it. - During replay, Thread.replayBlocked needs to mirror the original queueItemId so the deterministic gate ID matches and the short-circuit fires; without this, the tool tries to open a new gate. - Gate-status persistence (pending -> resolved) lived in the requestDecision continuation; the short-circuit path bypassed it. Moved to Thread.resolveDecision so both live and replay paths persist the resolved status.

bin/repl.ts wires the engine to a real Anthropic model via pi-ai's getModel('anthropic', ...), with InMemorySessionStore + InMemoryEventBus + VirtualSandbox. Supports single-shot (`pnpm repl 'say hi'`) and interactive (`pnpm repl`) modes. Streams text deltas, tool calls, decision gates, and turn boundaries to stdout. Defaults to claude-haiku-4-5; override with VALET_MODEL or VALET_SYSTEM_PROMPT. Reads ANTHROPIC_API_KEY from env via pi-ai's provider auto-resolution.

LocalSandbox wraps node:fs/promises and node:child_process.spawn to implement the Sandbox interface against the real host filesystem and shell. Relative paths resolve against the configured workspace; absolute paths are honored as-is (no escape prevention — this is a dev/testing sandbox, security goes into the Docker provider). ExecOpts honored: - cwd (relative paths resolved against workspace, default = workspace) - env (merged over process.env) - timeout (SIGKILL on expiry, timedOut: true on result) - signal (AbortSignal cancellation) - stdin (piped to the child) - maxOutputBytes (truncated: true on result) 19 tests cover FS round-trips, exec lifecycle (timeout, abort, truncation, stdin, env, cwd), and provider behavior. Total suite: 60 tests, all green. REPL gains a VALET_SANDBOX=local|virtual switch (default virtual) and VALET_WORKSPACE for the local workspace path. Smoke-tested against a tmp scratch dir AND the valet repo itself — engine read its own README and listed packages/ via real shell.

…tion The bridge does NOT register one engine-visible tool per plugin action. That approach would (a) collide with Anthropic's tool-name regex (`^[a-zA-Z0-9_-]{1,128}$` — action ids like 'github.create_issue' are rejected), (b) blow past LLM tool-catalog budgets when many plugins are active, and (c) force every session to pay the prompt cost of every action even when only a few are relevant. Instead, actionBridgeTools({ sources }) returns exactly two ToolDefs: - list_tools({ service?, query?, limit? }): searchable catalog with per-action params + risk levels + per-service auth warnings. - call_tool({ tool_id, params, summary }): dispatches by action id (kept untouched, dots and all). Approval gates honor riskLevel via ctx.requestDecision; user denial short-circuits without invoking the action. Same pattern OpenCode uses in the existing valet runtime, so plugin ActionSource shapes port across with no changes. Spec updated ("Plugin Action Bridge" section): documents the why (provider regex, catalog budget, prompt cost), the new ActionBridgeOptions shape, and the list_tools/call_tool semantics. Engine.thread now also surfaces assistant errorMessage as an 'error' event and translates stopReason 'error' into a 'turn_end: error' rather than masking it as 'end_turn' — found while debugging the dogfood pass. Dogfood: REPL with GITHUB_TOKEN=$(gh auth token) successfully searches the catalog with list_tools, calls github.get_repository via call_tool, and reports description/default branch/star count from the live API response. 9 bridge unit tests + 60 existing tests all green (69 total).

Replaces the hand-wavy compaction section with a concrete two-technique design informed by OpenCode's compaction module. - Two triggers: proactive (token threshold post-turn) and reactive (overflow error retry via pi-ai's isContextOverflow). - Tail preservation: keep last N turns clamped to a token budget, with mid-turn split when a single turn exceeds the budget. - Pruning: walk newest-first, mark stale tool outputs as elided after pruneProtectTokens of recent output. No LLM call. Skips protected tools. Only commits if it'd save >= pruneMinimumTokens. - Compaction: summarize head into a structured markdown template (Goal/Constraints/Progress/Key Decisions/Next Steps/Critical Context/ Relevant Files). Iterative — subsequent compactions update the previous summary rather than write a fresh one. - LLM-context assembly: convertToLlm drops covered entries and injects the summary as a user message; elided tool outputs are replaced with a placeholder. Same path used during restoreSession. - Auto-continue after proactive compaction; reactive compactions retry the originating turn instead. - Concrete configuration table with defaults.

Implements the spec'd compaction design (informed by OpenCode): - src/compaction.ts: pure primitives — usableTokens, tailBudget, turns, selectCutPoint (with mid-turn split), planPrune/applyPrune, extractFileContext (read vs modified), summarize (one-shot completeSimple with the OpenCode-style structured-markdown template), iterative anchoring via previousSummary. - src/types.ts: CompactionConfig (enabled, reserveTokens, tailTurns, min/maxPreserveRecentTokens, pruneProtectTokens, pruneMinimumTokens, toolOutputMaxChars, summarizerModel, protectedTools), wired into CreateSessionOptions. ToolDef.protectedFromPruning. MessagePart tool_call.elided flag. - src/thread.ts: - lastAssistantUsage capture in turn_end handler. - Thread.compactThread orchestrator: prune (cheap, no LLM), select cut point, summarize head into a CompactionEntry, persist, rewrite agent.state.messages. - Proactive trigger in runItem: post-turn check shouldCompactProactive (lastAssistantUsage.total >= usable). - Reactive trigger in runAgent: catch isContextOverflow on assistant error, compact, retry the same prompt once. - rehydrateTranscript now delegates to entriesToAgentMessages, an exported pure function that drops covered entries and injects the summary as a <previous-context> user message. - 21 pure compaction tests + 2 integration tests against the faux provider. Total suite: 92 tests, all green. Known limitation: applyPrune mutates entries in memory but the current SessionStore APIs (appendEntries-only) don't expose an in-place row update, so pruning persists only to the live agent transcript for now. Proper persistence requires adding an updateEntry method to SessionStore — left as a follow-up since it doesn't block compaction correctness, just observability of pruned state across restarts.

Required so pruning during compaction can persist tool-result elision back to the DAG. Also clarifies that pruning's persistence is atomic per entry: updateEntry rewrites the entire MessageEntry row with the same id, including the mutated tool_call parts. Throws NotFoundError if no matching entry exists in (sessionId, threadId).

Two follow-ups to the compaction landing: 1. SessionStore.updateEntry: rewrite an entry in place by id. Throws NotFoundError (new errors.ts module) when (sessionId, threadId, id) doesn't match. Implemented on InMemorySessionStore (Array.findIndex + replace) and SqliteSessionStore (UPDATE with changes=0 check). Two contract tests added; runs against both backends. Thread.compactThread's prune branch now calls store.updateEntry for each elided entry instead of dropping the persistence on the floor. Verified by an integration test that pre-populates the DAG with bash-output-heavy turns, triggers compaction, and confirms a-1's tool_call.elided is set in the DAG after the compaction completes. 2. Auto-continue: after a proactive compaction (cfg.autoContinue !== false), inject a synthetic user message — 'Continue if you have next steps, or stop and ask for clarification...' — tagged with metadata.compaction_continue=true so client UIs can hide it. Pushed onto the thread's queue, picked up by the next tickQueue cycle. Reactive (overflow) compactions never auto-continue. Added skipNextProactiveCheck cooldown so the auto-continue turn itself doesn't immediately re-trigger compaction (the summary + system prompt can still exceed usable on a small-context model; without the cooldown we'd loop). QueueItem.metadata now flows through to MessageEntry.metadata so the compaction_continue tag survives into the DAG and across restarts. Two integration tests: on-path (auto-continue runs, response recorded), off-path (autoContinue: false suppresses). Spec updated: SessionStore.updateEntry signature; pruning persistence clarified; CompactionConfig.autoContinue added to the config table. 99 tests, all green.

Two new env vars: - VALET_CONTEXT_WINDOW: override the model's local contextWindow - VALET_MAX_TOKENS: override the model's local maxTokens Anthropic's API still accepts the model's real (much larger) context window; the override only affects the engine's 'usable' calculation, which is what triggers proactive compaction. Useful for dogfooding the compaction loop with a real LLM at small budgets so we don't have to generate 100k tokens of context to see it fire. Plus per-event printers for compaction_start / compaction_end so the REPL output makes it obvious when compaction kicks in. Verified end-to-end against Claude Haiku 4.5 with VALET_CONTEXT_WINDOW=8000 and VALET_MAX_TOKENS=1000: - Agent reads 5 files (~60KB tool output) - Proactive compaction fires after the first turn - Auto-continue turn runs and references 'the previous context' from the injected summary, correctly understanding the task was complete.

yourbuddyconner added 26 commits May 5, 2026 20:27

docs: add portable runtime v1 spec

dca1aae

chore(engine): add drizzle-orm and better-sqlite3 deps

1a4c732

feat(engine): add Drizzle SQLite schema for engine tables

a0ceb41

feat(engine): generate initial sqlite migration

cd27d3d

test(engine): add SessionStore contract test suite

d5842c7

test(engine): run contract suite against InMemorySessionStore

22c0b97

feat(engine): SqliteSessionStore implementation

a7d4bd5

test(engine): run contract suite against SqliteSessionStore

f507e23

feat(engine): deterministic gate IDs derived from resumeKey

1e00e33

feat(engine): requestDecision short-circuits on suspendedDecision replay

982ee04

feat(engine): persist real tool call id and args on gate suspension

34b17a4

test(engine): unit-test the gate short-circuit predicate

a9db358

feat(engine): restoreSession rehydrates session, threads, and transcr…

53ab91b

…ipts

feat(engine): replay blocked tool turns on session restore

fa0b3bc

docs(engine): document persistent store and restart-safe gates

fc3a487

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document portable runtime V1 spec#46

Document portable runtime V1 spec#46
yourbuddyconner wants to merge 26 commits intomainfrom
portable-runtime-v1-spec

yourbuddyconner commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yourbuddyconner commented May 6, 2026

Summary

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant