Skip to content

Document portable runtime V1 spec#46

Open
yourbuddyconner wants to merge 26 commits intomainfrom
portable-runtime-v1-spec
Open

Document portable runtime V1 spec#46
yourbuddyconner wants to merge 26 commits intomainfrom
portable-runtime-v1-spec

Conversation

@yourbuddyconner
Copy link
Copy Markdown
Owner

Summary

  • Adds the portable runtime engine V1 specification.
  • Documents the required V1 contracts across engine API, data model, decision gates, providers, sandbox RPC, channel transports, API routes, client events, schema, adapter hosting, and observability.
  • Standardizes the spec on DecisionGate terminology for approvals, questions, and credential requests.

Verification

  • rg -n "InteractivePrompt|interactive_prompt|interactive_prompts|interactive prompt" docs/specs/2026-05-02-portable-runtime-engine-design.md
  • rg -n "TODO|TBD|FIXME" docs/specs/2026-05-02-portable-runtime-engine-design.md

Adds packages/engine — a portable agent runtime library that implements
the V1 design from docs/specs/2026-05-02-portable-runtime-engine-design.md.

Built on @mariozechner/pi-ai + @mariozechner/pi-agent-core. The engine
itself has zero platform dependencies; platform adapters (Cloudflare,
K8s) host it.

What's implemented:
- Engine public API (createSession, prompt, resolveDecision, abort,
  pause/resume) + Session/Thread classes.
- Per-thread queue with three modes: followup (FIFO), steer (abort +
  start), collect (buffered window).
- Decision gates with full lifecycle: pending -> resolved/withdrawn/
  expired. Steer withdraws pending gates with reason=steer; abort with
  reason=abort. DecisionGateEntry persists in the DAG.
- Multi-thread sessions: threads run concurrently with isolated
  histories; aborting one doesn't affect siblings.
- Built-in thread_read tool for cross-thread visibility.
- In-memory providers + VirtualSandbox so the full engine runs in
  vitest with no containers.

14 tests (happy path, decision gates, queue modes, multi-thread +
thread_read), all green in <2s.

Spec-vs-reality deltas reconciled by the implementation are documented
in packages/engine/README.md (tool signature wrapping, no native
suspension primitive, message_start vs message_update).

Deferred for follow-up: restart-safe re-entrant gates (the
SuspendedTurnState record is written but Engine.restoreSession is a
stub), compaction, role/skill loading, model failover, ActionSource
bridge, structured-result extraction.
Spec updates (docs/specs/2026-05-02-portable-runtime-engine-design.md):
- Engine.restoreSession now takes { sessionId, options } so the host
  re-supplies tools/sandbox/model — the engine doesn't persist creation
  options across restarts.
- Decision-gate IDs are explicitly derived as
  gate:{sessionId}:{threadId}:{queueItemId}:{resumeKey}, and resumeKey
  is required (not optional) on DecisionGateRequest.
- Restart-safe contract section now explains what "re-entrant up to the
  decision point" means in practice (work before requestDecision runs
  twice; work after runs once on replay), how the engine populates
  ctx.suspendedDecision on restoration, and what events the engine must
  emit during replay.
- New "LLM-faithful entry persistence" rule: assistant tool-call blocks
  must persist in MessageEntry.parts so the rehydrated transcript can
  be sent to LLM providers without producing a malformed
  [user, assistant(text-only), toolResult] sequence.
- ToolContext.requestDecision typed as DecisionGateRequest (not
  DecisionGate); ctx.suspendedDecision documented as engine-only.
- SuspendedTurnState bullets updated to reference the deterministic
  formula and the resumeKey explicitly.
- Adapter Host Contract calls out engine.restoreSession({ sessionId,
  options }) and the queue-item / suspended-turn fields that must
  survive hibernation.

Plan (docs/plans/2026-05-05-persistent-store-restart-safe-gates.md):
- 16 tasks across 4 phases (schema, store + contract tests, restart-safe
  primitives, restoreSession + replay), all aligned with the updated
  spec.
- Task 11 reworked from a flaky integration test that raced the agent
  loop into a deterministic unit test of a pure shouldShortCircuit
  predicate.
- Task 12's rehydrateTranscript explicitly reconstructs assistant
  ToolCall blocks from MessageEntry.parts.
Task 15 surfaced two real bugs in the prototype:
- Session.rehydrate's resumeBlockedThreadIfReady call was fire-and-forget,
  racing with resolveDecision callers; awaiting it ensures the gate is
  re-armed before any caller can resolve it.
- During replay, Thread.replayBlocked needs to mirror the original
  queueItemId so the deterministic gate ID matches and the
  short-circuit fires; without this, the tool tries to open a new gate.
- Gate-status persistence (pending -> resolved) lived in the
  requestDecision continuation; the short-circuit path bypassed it.
  Moved to Thread.resolveDecision so both live and replay paths persist
  the resolved status.
bin/repl.ts wires the engine to a real Anthropic model via pi-ai's
getModel('anthropic', ...), with InMemorySessionStore +
InMemoryEventBus + VirtualSandbox. Supports single-shot
(`pnpm repl 'say hi'`) and interactive (`pnpm repl`) modes. Streams
text deltas, tool calls, decision gates, and turn boundaries to stdout.

Defaults to claude-haiku-4-5; override with VALET_MODEL or
VALET_SYSTEM_PROMPT. Reads ANTHROPIC_API_KEY from env via pi-ai's
provider auto-resolution.
LocalSandbox wraps node:fs/promises and node:child_process.spawn to
implement the Sandbox interface against the real host filesystem and
shell. Relative paths resolve against the configured workspace; absolute
paths are honored as-is (no escape prevention — this is a dev/testing
sandbox, security goes into the Docker provider).

ExecOpts honored:
- cwd (relative paths resolved against workspace, default = workspace)
- env (merged over process.env)
- timeout (SIGKILL on expiry, timedOut: true on result)
- signal (AbortSignal cancellation)
- stdin (piped to the child)
- maxOutputBytes (truncated: true on result)

19 tests cover FS round-trips, exec lifecycle (timeout, abort, truncation,
stdin, env, cwd), and provider behavior. Total suite: 60 tests, all green.

REPL gains a VALET_SANDBOX=local|virtual switch (default virtual) and
VALET_WORKSPACE for the local workspace path. Smoke-tested against a tmp
scratch dir AND the valet repo itself — engine read its own README and
listed packages/ via real shell.
…tion

The bridge does NOT register one engine-visible tool per plugin action.
That approach would (a) collide with Anthropic's tool-name regex
(`^[a-zA-Z0-9_-]{1,128}$` — action ids like 'github.create_issue' are
rejected), (b) blow past LLM tool-catalog budgets when many plugins are
active, and (c) force every session to pay the prompt cost of every
action even when only a few are relevant.

Instead, actionBridgeTools({ sources }) returns exactly two ToolDefs:

- list_tools({ service?, query?, limit? }): searchable catalog with
  per-action params + risk levels + per-service auth warnings.
- call_tool({ tool_id, params, summary }): dispatches by action id
  (kept untouched, dots and all). Approval gates honor riskLevel via
  ctx.requestDecision; user denial short-circuits without invoking
  the action.

Same pattern OpenCode uses in the existing valet runtime, so plugin
ActionSource shapes port across with no changes.

Spec updated ("Plugin Action Bridge" section): documents the why
(provider regex, catalog budget, prompt cost), the new ActionBridgeOptions
shape, and the list_tools/call_tool semantics.

Engine.thread now also surfaces assistant errorMessage as an 'error'
event and translates stopReason 'error' into a 'turn_end: error' rather
than masking it as 'end_turn' — found while debugging the dogfood pass.

Dogfood: REPL with GITHUB_TOKEN=$(gh auth token) successfully searches
the catalog with list_tools, calls github.get_repository via call_tool,
and reports description/default branch/star count from the live API
response. 9 bridge unit tests + 60 existing tests all green (69 total).
Replaces the hand-wavy compaction section with a concrete two-technique
design informed by OpenCode's compaction module.

- Two triggers: proactive (token threshold post-turn) and reactive
  (overflow error retry via pi-ai's isContextOverflow).
- Tail preservation: keep last N turns clamped to a token budget, with
  mid-turn split when a single turn exceeds the budget.
- Pruning: walk newest-first, mark stale tool outputs as elided after
  pruneProtectTokens of recent output. No LLM call. Skips protected
  tools. Only commits if it'd save >= pruneMinimumTokens.
- Compaction: summarize head into a structured markdown template
  (Goal/Constraints/Progress/Key Decisions/Next Steps/Critical Context/
  Relevant Files). Iterative — subsequent compactions update the
  previous summary rather than write a fresh one.
- LLM-context assembly: convertToLlm drops covered entries and injects
  the summary as a user message; elided tool outputs are replaced with
  a placeholder. Same path used during restoreSession.
- Auto-continue after proactive compaction; reactive compactions
  retry the originating turn instead.
- Concrete configuration table with defaults.
Implements the spec'd compaction design (informed by OpenCode):

- src/compaction.ts: pure primitives — usableTokens, tailBudget, turns,
  selectCutPoint (with mid-turn split), planPrune/applyPrune,
  extractFileContext (read vs modified), summarize (one-shot
  completeSimple with the OpenCode-style structured-markdown template),
  iterative anchoring via previousSummary.

- src/types.ts: CompactionConfig (enabled, reserveTokens, tailTurns,
  min/maxPreserveRecentTokens, pruneProtectTokens, pruneMinimumTokens,
  toolOutputMaxChars, summarizerModel, protectedTools), wired into
  CreateSessionOptions. ToolDef.protectedFromPruning. MessagePart
  tool_call.elided flag.

- src/thread.ts:
  - lastAssistantUsage capture in turn_end handler.
  - Thread.compactThread orchestrator: prune (cheap, no LLM), select
    cut point, summarize head into a CompactionEntry, persist, rewrite
    agent.state.messages.
  - Proactive trigger in runItem: post-turn check shouldCompactProactive
    (lastAssistantUsage.total >= usable).
  - Reactive trigger in runAgent: catch isContextOverflow on assistant
    error, compact, retry the same prompt once.
  - rehydrateTranscript now delegates to entriesToAgentMessages, an
    exported pure function that drops covered entries and injects the
    summary as a <previous-context> user message.

- 21 pure compaction tests + 2 integration tests against the faux
  provider. Total suite: 92 tests, all green.

Known limitation: applyPrune mutates entries in memory but the current
SessionStore APIs (appendEntries-only) don't expose an in-place row
update, so pruning persists only to the live agent transcript for now.
Proper persistence requires adding an updateEntry method to
SessionStore — left as a follow-up since it doesn't block compaction
correctness, just observability of pruned state across restarts.
Required so pruning during compaction can persist tool-result elision
back to the DAG. Also clarifies that pruning's persistence is atomic
per entry: updateEntry rewrites the entire MessageEntry row with the
same id, including the mutated tool_call parts. Throws NotFoundError
if no matching entry exists in (sessionId, threadId).
Two follow-ups to the compaction landing:

1. SessionStore.updateEntry: rewrite an entry in place by id. Throws
   NotFoundError (new errors.ts module) when (sessionId, threadId, id)
   doesn't match. Implemented on InMemorySessionStore (Array.findIndex
   + replace) and SqliteSessionStore (UPDATE with changes=0 check).
   Two contract tests added; runs against both backends.

   Thread.compactThread's prune branch now calls store.updateEntry for
   each elided entry instead of dropping the persistence on the floor.
   Verified by an integration test that pre-populates the DAG with
   bash-output-heavy turns, triggers compaction, and confirms a-1's
   tool_call.elided is set in the DAG after the compaction completes.

2. Auto-continue: after a proactive compaction (cfg.autoContinue !==
   false), inject a synthetic user message —
     'Continue if you have next steps, or stop and ask for clarification...'
   — tagged with metadata.compaction_continue=true so client UIs can
   hide it. Pushed onto the thread's queue, picked up by the next
   tickQueue cycle. Reactive (overflow) compactions never auto-continue.

   Added skipNextProactiveCheck cooldown so the auto-continue turn
   itself doesn't immediately re-trigger compaction (the summary +
   system prompt can still exceed usable on a small-context model;
   without the cooldown we'd loop).

   QueueItem.metadata now flows through to MessageEntry.metadata so the
   compaction_continue tag survives into the DAG and across restarts.

   Two integration tests: on-path (auto-continue runs, response
   recorded), off-path (autoContinue: false suppresses).

Spec updated: SessionStore.updateEntry signature; pruning persistence
clarified; CompactionConfig.autoContinue added to the config table.
99 tests, all green.
Two new env vars:
- VALET_CONTEXT_WINDOW: override the model's local contextWindow
- VALET_MAX_TOKENS: override the model's local maxTokens

Anthropic's API still accepts the model's real (much larger) context
window; the override only affects the engine's 'usable' calculation,
which is what triggers proactive compaction. Useful for dogfooding the
compaction loop with a real LLM at small budgets so we don't have to
generate 100k tokens of context to see it fire.

Plus per-event printers for compaction_start / compaction_end so the
REPL output makes it obvious when compaction kicks in.

Verified end-to-end against Claude Haiku 4.5 with VALET_CONTEXT_WINDOW=8000
and VALET_MAX_TOKENS=1000:
- Agent reads 5 files (~60KB tool output)
- Proactive compaction fires after the first turn
- Auto-continue turn runs and references 'the previous context' from
  the injected summary, correctly understanding the task was complete.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant