Document portable runtime V1 spec#46
Open
yourbuddyconner wants to merge 26 commits intomainfrom
Open
Conversation
Adds packages/engine — a portable agent runtime library that implements the V1 design from docs/specs/2026-05-02-portable-runtime-engine-design.md. Built on @mariozechner/pi-ai + @mariozechner/pi-agent-core. The engine itself has zero platform dependencies; platform adapters (Cloudflare, K8s) host it. What's implemented: - Engine public API (createSession, prompt, resolveDecision, abort, pause/resume) + Session/Thread classes. - Per-thread queue with three modes: followup (FIFO), steer (abort + start), collect (buffered window). - Decision gates with full lifecycle: pending -> resolved/withdrawn/ expired. Steer withdraws pending gates with reason=steer; abort with reason=abort. DecisionGateEntry persists in the DAG. - Multi-thread sessions: threads run concurrently with isolated histories; aborting one doesn't affect siblings. - Built-in thread_read tool for cross-thread visibility. - In-memory providers + VirtualSandbox so the full engine runs in vitest with no containers. 14 tests (happy path, decision gates, queue modes, multi-thread + thread_read), all green in <2s. Spec-vs-reality deltas reconciled by the implementation are documented in packages/engine/README.md (tool signature wrapping, no native suspension primitive, message_start vs message_update). Deferred for follow-up: restart-safe re-entrant gates (the SuspendedTurnState record is written but Engine.restoreSession is a stub), compaction, role/skill loading, model failover, ActionSource bridge, structured-result extraction.
Spec updates (docs/specs/2026-05-02-portable-runtime-engine-design.md):
- Engine.restoreSession now takes { sessionId, options } so the host
re-supplies tools/sandbox/model — the engine doesn't persist creation
options across restarts.
- Decision-gate IDs are explicitly derived as
gate:{sessionId}:{threadId}:{queueItemId}:{resumeKey}, and resumeKey
is required (not optional) on DecisionGateRequest.
- Restart-safe contract section now explains what "re-entrant up to the
decision point" means in practice (work before requestDecision runs
twice; work after runs once on replay), how the engine populates
ctx.suspendedDecision on restoration, and what events the engine must
emit during replay.
- New "LLM-faithful entry persistence" rule: assistant tool-call blocks
must persist in MessageEntry.parts so the rehydrated transcript can
be sent to LLM providers without producing a malformed
[user, assistant(text-only), toolResult] sequence.
- ToolContext.requestDecision typed as DecisionGateRequest (not
DecisionGate); ctx.suspendedDecision documented as engine-only.
- SuspendedTurnState bullets updated to reference the deterministic
formula and the resumeKey explicitly.
- Adapter Host Contract calls out engine.restoreSession({ sessionId,
options }) and the queue-item / suspended-turn fields that must
survive hibernation.
Plan (docs/plans/2026-05-05-persistent-store-restart-safe-gates.md):
- 16 tasks across 4 phases (schema, store + contract tests, restart-safe
primitives, restoreSession + replay), all aligned with the updated
spec.
- Task 11 reworked from a flaky integration test that raced the agent
loop into a deterministic unit test of a pure shouldShortCircuit
predicate.
- Task 12's rehydrateTranscript explicitly reconstructs assistant
ToolCall blocks from MessageEntry.parts.
Task 15 surfaced two real bugs in the prototype: - Session.rehydrate's resumeBlockedThreadIfReady call was fire-and-forget, racing with resolveDecision callers; awaiting it ensures the gate is re-armed before any caller can resolve it. - During replay, Thread.replayBlocked needs to mirror the original queueItemId so the deterministic gate ID matches and the short-circuit fires; without this, the tool tries to open a new gate. - Gate-status persistence (pending -> resolved) lived in the requestDecision continuation; the short-circuit path bypassed it. Moved to Thread.resolveDecision so both live and replay paths persist the resolved status.
bin/repl.ts wires the engine to a real Anthropic model via pi-ai's
getModel('anthropic', ...), with InMemorySessionStore +
InMemoryEventBus + VirtualSandbox. Supports single-shot
(`pnpm repl 'say hi'`) and interactive (`pnpm repl`) modes. Streams
text deltas, tool calls, decision gates, and turn boundaries to stdout.
Defaults to claude-haiku-4-5; override with VALET_MODEL or
VALET_SYSTEM_PROMPT. Reads ANTHROPIC_API_KEY from env via pi-ai's
provider auto-resolution.
LocalSandbox wraps node:fs/promises and node:child_process.spawn to implement the Sandbox interface against the real host filesystem and shell. Relative paths resolve against the configured workspace; absolute paths are honored as-is (no escape prevention — this is a dev/testing sandbox, security goes into the Docker provider). ExecOpts honored: - cwd (relative paths resolved against workspace, default = workspace) - env (merged over process.env) - timeout (SIGKILL on expiry, timedOut: true on result) - signal (AbortSignal cancellation) - stdin (piped to the child) - maxOutputBytes (truncated: true on result) 19 tests cover FS round-trips, exec lifecycle (timeout, abort, truncation, stdin, env, cwd), and provider behavior. Total suite: 60 tests, all green. REPL gains a VALET_SANDBOX=local|virtual switch (default virtual) and VALET_WORKSPACE for the local workspace path. Smoke-tested against a tmp scratch dir AND the valet repo itself — engine read its own README and listed packages/ via real shell.
…tion
The bridge does NOT register one engine-visible tool per plugin action.
That approach would (a) collide with Anthropic's tool-name regex
(`^[a-zA-Z0-9_-]{1,128}$` — action ids like 'github.create_issue' are
rejected), (b) blow past LLM tool-catalog budgets when many plugins are
active, and (c) force every session to pay the prompt cost of every
action even when only a few are relevant.
Instead, actionBridgeTools({ sources }) returns exactly two ToolDefs:
- list_tools({ service?, query?, limit? }): searchable catalog with
per-action params + risk levels + per-service auth warnings.
- call_tool({ tool_id, params, summary }): dispatches by action id
(kept untouched, dots and all). Approval gates honor riskLevel via
ctx.requestDecision; user denial short-circuits without invoking
the action.
Same pattern OpenCode uses in the existing valet runtime, so plugin
ActionSource shapes port across with no changes.
Spec updated ("Plugin Action Bridge" section): documents the why
(provider regex, catalog budget, prompt cost), the new ActionBridgeOptions
shape, and the list_tools/call_tool semantics.
Engine.thread now also surfaces assistant errorMessage as an 'error'
event and translates stopReason 'error' into a 'turn_end: error' rather
than masking it as 'end_turn' — found while debugging the dogfood pass.
Dogfood: REPL with GITHUB_TOKEN=$(gh auth token) successfully searches
the catalog with list_tools, calls github.get_repository via call_tool,
and reports description/default branch/star count from the live API
response. 9 bridge unit tests + 60 existing tests all green (69 total).
Replaces the hand-wavy compaction section with a concrete two-technique design informed by OpenCode's compaction module. - Two triggers: proactive (token threshold post-turn) and reactive (overflow error retry via pi-ai's isContextOverflow). - Tail preservation: keep last N turns clamped to a token budget, with mid-turn split when a single turn exceeds the budget. - Pruning: walk newest-first, mark stale tool outputs as elided after pruneProtectTokens of recent output. No LLM call. Skips protected tools. Only commits if it'd save >= pruneMinimumTokens. - Compaction: summarize head into a structured markdown template (Goal/Constraints/Progress/Key Decisions/Next Steps/Critical Context/ Relevant Files). Iterative — subsequent compactions update the previous summary rather than write a fresh one. - LLM-context assembly: convertToLlm drops covered entries and injects the summary as a user message; elided tool outputs are replaced with a placeholder. Same path used during restoreSession. - Auto-continue after proactive compaction; reactive compactions retry the originating turn instead. - Concrete configuration table with defaults.
Implements the spec'd compaction design (informed by OpenCode):
- src/compaction.ts: pure primitives — usableTokens, tailBudget, turns,
selectCutPoint (with mid-turn split), planPrune/applyPrune,
extractFileContext (read vs modified), summarize (one-shot
completeSimple with the OpenCode-style structured-markdown template),
iterative anchoring via previousSummary.
- src/types.ts: CompactionConfig (enabled, reserveTokens, tailTurns,
min/maxPreserveRecentTokens, pruneProtectTokens, pruneMinimumTokens,
toolOutputMaxChars, summarizerModel, protectedTools), wired into
CreateSessionOptions. ToolDef.protectedFromPruning. MessagePart
tool_call.elided flag.
- src/thread.ts:
- lastAssistantUsage capture in turn_end handler.
- Thread.compactThread orchestrator: prune (cheap, no LLM), select
cut point, summarize head into a CompactionEntry, persist, rewrite
agent.state.messages.
- Proactive trigger in runItem: post-turn check shouldCompactProactive
(lastAssistantUsage.total >= usable).
- Reactive trigger in runAgent: catch isContextOverflow on assistant
error, compact, retry the same prompt once.
- rehydrateTranscript now delegates to entriesToAgentMessages, an
exported pure function that drops covered entries and injects the
summary as a <previous-context> user message.
- 21 pure compaction tests + 2 integration tests against the faux
provider. Total suite: 92 tests, all green.
Known limitation: applyPrune mutates entries in memory but the current
SessionStore APIs (appendEntries-only) don't expose an in-place row
update, so pruning persists only to the live agent transcript for now.
Proper persistence requires adding an updateEntry method to
SessionStore — left as a follow-up since it doesn't block compaction
correctness, just observability of pruned state across restarts.
Required so pruning during compaction can persist tool-result elision back to the DAG. Also clarifies that pruning's persistence is atomic per entry: updateEntry rewrites the entire MessageEntry row with the same id, including the mutated tool_call parts. Throws NotFoundError if no matching entry exists in (sessionId, threadId).
Two follow-ups to the compaction landing:
1. SessionStore.updateEntry: rewrite an entry in place by id. Throws
NotFoundError (new errors.ts module) when (sessionId, threadId, id)
doesn't match. Implemented on InMemorySessionStore (Array.findIndex
+ replace) and SqliteSessionStore (UPDATE with changes=0 check).
Two contract tests added; runs against both backends.
Thread.compactThread's prune branch now calls store.updateEntry for
each elided entry instead of dropping the persistence on the floor.
Verified by an integration test that pre-populates the DAG with
bash-output-heavy turns, triggers compaction, and confirms a-1's
tool_call.elided is set in the DAG after the compaction completes.
2. Auto-continue: after a proactive compaction (cfg.autoContinue !==
false), inject a synthetic user message —
'Continue if you have next steps, or stop and ask for clarification...'
— tagged with metadata.compaction_continue=true so client UIs can
hide it. Pushed onto the thread's queue, picked up by the next
tickQueue cycle. Reactive (overflow) compactions never auto-continue.
Added skipNextProactiveCheck cooldown so the auto-continue turn
itself doesn't immediately re-trigger compaction (the summary +
system prompt can still exceed usable on a small-context model;
without the cooldown we'd loop).
QueueItem.metadata now flows through to MessageEntry.metadata so the
compaction_continue tag survives into the DAG and across restarts.
Two integration tests: on-path (auto-continue runs, response
recorded), off-path (autoContinue: false suppresses).
Spec updated: SessionStore.updateEntry signature; pruning persistence
clarified; CompactionConfig.autoContinue added to the config table.
99 tests, all green.
Two new env vars: - VALET_CONTEXT_WINDOW: override the model's local contextWindow - VALET_MAX_TOKENS: override the model's local maxTokens Anthropic's API still accepts the model's real (much larger) context window; the override only affects the engine's 'usable' calculation, which is what triggers proactive compaction. Useful for dogfooding the compaction loop with a real LLM at small budgets so we don't have to generate 100k tokens of context to see it fire. Plus per-event printers for compaction_start / compaction_end so the REPL output makes it obvious when compaction kicks in. Verified end-to-end against Claude Haiku 4.5 with VALET_CONTEXT_WINDOW=8000 and VALET_MAX_TOKENS=1000: - Agent reads 5 files (~60KB tool output) - Proactive compaction fires after the first turn - Auto-continue turn runs and references 'the previous context' from the injected summary, correctly understanding the task was complete.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Verification