feat(harness): defer worktree tools behind tool_info activation#1057
Conversation
Withhold enter-worktree/exit-worktree from the default tool surface. The model loads their full description + provider-transformed schema on demand via a new tool_info tool, which activates them on the next model step (registry.tools() re-assembles the array every step). Activation is derived from durable conversation history (completed tool_info calls), so it survives retry/fork/compaction without a parallel side state. tool_info's card list and the activation gate reuse the same permission/user.tools filtering as the real tool surface. A repair hint points the model at tool_info when it calls a still -deferred tool directly. v1 is a mechanism spike, not byte reduction: the two worktree tools are 0-invocation in samples (a sampling artifact, not lack of use) and net savings are ~0 once tool_info's own footprint is counted. Byte wins come when large tools are folded in later. See #1054 correction notes + alignment comment.
- Register tool_info as read-only in run-observability sanitize, so ordinary turns get a complete boundary snapshot and a harmless tool_info call is not recorded as an unsafe side effect (which could downgrade retry/incident-safety decisions). - tool_info now refuses a deferred tool hidden this turn by permission / user.tools (deferredAvailable threaded through ctx.extra), instead of telling the model a disabled tool is activated and making it burn turns calling an absent tool.
|
Warning Review limit reached
More reviews will be available in 17 minutes and 46 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (5)
📝 WalkthroughWalkthroughThis PR implements a deferred tool loading system that allows optional tools ( ChangesDeferred Tool Loading System
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces a deferred tool loading mechanism to optimize the model's context window. Low-frequency tools, specifically enter-worktree and exit-worktree, are withheld from the initial tool surface and instead advertised as short cards within a new tool_info tool. The model can call tool_info to dynamically load a tool's full schema and description, activating it for subsequent steps. The feedback focuses on improving the robustness of this mechanism: first, by normalizing tool names to lowercase in deriveActivatedTools and error hints to handle casing mismatches gracefully; and second, by propagating operational errors in tool_info as typed failures using Effect.fail rather than throwing raw errors that trigger critical application defects via Effect.orDie.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
After tool_info(X) completes, small/medium models don't reflexively re-scan the next turn's tool array; they act on their prior impression of available tools and ignore the freshly exposed schema. mimo-v2.5 read the "now available" output, then in its reasoning said "from my tool list ... no enter-worktree" and fell back to `bash git worktree add`. The mechanism itself works (next-step registry really exposes X). The fix is to signal "X is callable" through a channel models actually attend to: a synthetic <system-reminder> in the user message of the step that follows activation, anchored on the existing SessionDiagnostics injection point so the reminder and the new tool land in the same model call. Anti-fallback hints are per-tool (enter-worktree's bash equivalent is explicitly ruled out, learned from the real session log). Also fixed: tool_info output line is sharper, and the worktree tool docs now use real kebab-case tool ids instead of CamelCase examples that would invite name hallucination.
P1 compaction survival: tool_info parts are in PRUNE_PROTECTED_TOOLS, so
deriveActivatedTools keeps seeing activations after older tool outputs are
pruned. The "derive from message history" design is honored by guaranteeing
the history segment is durable.
P1 plugin schema parity: tool_info runs the same plugin.trigger
("tool.definition") pipeline registry.tools() uses on every other tool, so the
description and parameters the model reads match what the next-step tool list
hands it. Injected as a closure to keep ToolInfoTool Plugin-free in its R.
P2 Tool.define wrapping: tool_info is now a first-class Tool.define + Tool.init,
gaining truncation, validation, span tracing, and uniform error formatting
like every other builtin. Yielded at the outer make scope.
P2 canonical hint: extracted buildDeferredHint so the repair-time message
always names the kebab-case canonical id; a CamelCase echo like "Enter-Worktree"
no longer becomes an invalid tool_info(name=...) suggestion.
P3 single source of truth: id, card, description, and parameters for each
deferred tool now live in one DEFERRED array. Adding a new deferred tool is
a one-entry change instead of editing three places.
Tests: PRUNE_PROTECTED_TOOLS membership and buildDeferredHint canonicalisation
(CamelCase echo + non-deferred no-op). The previous in-process execute tests
are removed because ToolInfoTool now requires the registry layer to instantiate;
execute-path behavior is covered by registry.test.ts and end-to-end usage.
- P2-1: repair hint now gates on deferred availability; a disabled or permission-denied tool no longer routes the model to a tool_info call that would just fail again. - P2-2: add a request-level parity test asserting tool_info's loaded schema equals the post-activation tool schema (and stays untruncated). - P3-1: canonicalise the name inside tool_info so a CamelCase echo like "Enter-Worktree" resolves to enter-worktree instead of erroring. - P3-2: opt tool_info output out of truncation so a large deferred tool's schema is never clipped mid-load.
- deriveActivatedTools canonicalises the recorded raw input name, so a CamelCase echo like "Enter-Worktree" activates the tool on the next step instead of leaving the model pointed at a tool the registry never exposes (P2). - registry schema-parity test passes the session model through ctx.extra and compares against ProviderTransform.schema(model, ...), exercising the provider-transform branch the prior assertion silently skipped (P3).
… compacted view deriveActivatedTools was fed the compaction-filtered message list (msgs), so a tool_info(name=X) activation older than the retained tail was truncated away: the deferred tool dropped out of the next turn's tool list and silently re-locked mid-session. Materialise the full durable stream once per loop and feed it to activation derivation via a dedicated activationMessages param; the model-facing view stays the filtered msgs. Reuses the stream the loop already read (filterCompactedEffect + the lastUser/lastFinished scan), so it is one fewer DB pass, not an extra one. Regression test asserts activation survives compaction truncation: the filtered view loses it, the full history keeps it.
…al Error The Tool.define wrapper defectifies a generic Effect.fail(Error) (already covered), but the model is on the Effect.runPromise reject path, which squashes that Die back to the original Error via causeSquash — no FiberFailure wrapper. Assert the wrapped execution rejects with the original Error message, not a FiberFailure, so tool_info's unknown/unavailable paths (plain Effect.fail) stay clean operational errors rather than noisy alerts.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/opencode/src/session/prompt.ts`:
- Around line 2119-2134: deriveNewlyActivated currently inspects only the latest
assistant message (msgs) so reminders get lost if a compaction summary becomes
the latest assistant; update the logic in the block that injects activation
reminders (the loop that uses deriveNewlyActivated, lastUser,
PartID.ascending(), and buildActivationReminder) to derive newly activated tool
ids from durable history/allMessages (skipping assistant summary messages
produced by compaction) or alternatively propagate the set of just-activated
tool ids from resolveTools() across the compaction turn, then use that
durable/propagated set to create the synthetic user parts for lastUser so
reminders are always injected regardless of compaction.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: ad3f3805-9098-422c-b7ac-8642bee1e07c
📒 Files selected for processing (12)
packages/opencode/src/session/compaction.tspackages/opencode/src/session/llm.tspackages/opencode/src/session/prompt.tspackages/opencode/src/session/run-observability/sanitize.tspackages/opencode/src/tool/enter-worktree.txtpackages/opencode/src/tool/exit-worktree.txtpackages/opencode/src/tool/registry.tspackages/opencode/src/tool/tool-info.tspackages/opencode/test/session/messages-pagination.test.tspackages/opencode/test/tool/registry.test.tspackages/opencode/test/tool/tool-define.test.tspackages/opencode/test/tool/tool-info.test.ts
…ry, not full hydration The previous fix materialised Array.from(stream(sessionID)) every loop to derive activation from the complete history. On long compacted sessions that regressed DB/parts hydration: filterCompacted and the lastUser/lastFinished scan both stop early at the compaction tail, but Array.from forced a full message+parts hydration of the entire history just to find a few tool_info parts. Replace it with MessageV2.toolInfoParts(sessionID): a single PartTable query filtered by json_extract($.tool)='tool_info' (same pattern as session.ts), built via the existing part() constructor. deriveActivatedToolsFromParts derives activation from those parts; the run loop goes back to the lazy early-breaking filterCompactedEffect + stream() it used before. Activation still spans the full durable history (reverted parts are physically deleted before the loop; forks use a fresh session id), so it stays correct across compaction without paying for full hydration. Regression test now asserts toolInfoParts spans compaction while the filtered model-facing view drops the activation.
…mary assistant The one-shot activation system-reminder was derived via deriveNewlyActivated(msgs) over the compaction-filtered view. If an auto-compaction lands between a tool_info activation and the next step and summarises the activating turn into the head (older than the retained tail), filterCompacted drops that turn and the newest assistant becomes the compaction summary (no tool_info part), so the reminder was silently lost. Activation itself was unaffected (it derives from durable tool_info parts); only the reminder. Add MessageV2.lastNonSummaryAssistant(sessionID): a durable single-row query (json_extract excludes summary assistants) for the newest real assistant turn. deriveNewlyActivated now takes that single message. Summaries are excluded at the source so a compaction cannot suppress the reminder; one-shot semantics hold because any real non-tool_info turn becomes the newest non-summary assistant and clears the set. Regression test: a compaction summarising the activating turn into the head still yields the activation from lastNonSummaryAssistant, while the filtered view newest assistant is the summary.
Summary
Implements the v1 tool-exposure mechanism from #1054: a
tool_infotool that withholds low-frequency tools' full description and parameter schema from the default tool surface, advertises them as one-line cards, and lets the model load and activate one on demand.enter-worktreeandexit-worktreeare the two deferred tools for v1.tool_info(name)returns the tool's full description plus the same provider-transformed JSON schema the real call will use, then marks it activated. The registry hides deferred ids unless activated, filters them through the same permission /user.toolsgate as any tool, and injects the live card list intotool_info's description each turn. Thellm.tsrepair path nudges a model that calls a not-yet-activated deferred tool towardtool_info.Activation is derived from durable conversation history, not a parallel side-state: a completed
tool_info(name=X)part means X is callable for the rest of that history, which mirrors provider-native tool search and stays consistent across retry / fork / compaction. The registry re-assembles the tool array each model step, so an activated tool becomes callable on the next step.Why
#1054 asks for a deferred/activate exposure path so heavy, low-frequency tools can stop paying full description + schema cost on every turn. This PR proves the mechanism end to end on the two worktree tools, which are 0-invocation in historical samples so a discovery miss harms no real workflow. It is a mechanism spike, not a byte-reduction win: these two schemas are tiny (exit-worktree empty, enter-worktree ~548B), so withholding them roughly nets to zero once
tool_infois added — the value is the validated loop that later, heavier tools can opt into. Provider-neutral by design: no Anthropicdefer_loading, no OpenAItool_search, no plugin-SDK extension.Related Issue
Closes the v1 scope in #1054 (see the correction notes and alignment comment on that issue).
Human Review Status
PendingReview Focus
The activation model:
deriveActivatedToolsreads completedtool_infoparts from message history rather than tracking a parallel side-state — confirm that is the right durability boundary. And the permission parity: deferred tools pass through the samePermission/user.toolsgate as any tool, andtool_inforefuses a tool hidden this turn (via thedeferredAvailablepredicate threaded throughctx.extra) so the model is never told a disabled tool is active.Risk Notes
Low blast radius: only
enter-worktree/exit-worktreeare deferred, both 0-invocation in historical samples, and their implementations are unchanged — only the exposure layer changes, so platform behavior is identical on macOS and Windows.tool_infois classified read-only in run-observability so it does not distort retry / incident-safety classification. Permission surface: deferred-tool visibility reuses the existingPermission/user.toolsgate, no new permission path. Skipped conditional checklist items: visible UI / copy (no UI surface touched); macOS/Windows platform consideration (worktree tool implementations unchanged, only their exposure).How To Verify
Screenshots or Recordings
N/A — no visible UI change.
Checklist
bug,enhancement,task,documentation. Type labels are author-added; the labeler bot does NOT assign them. Add the label in the GitHub UI, then tick this.app,ui,platform,harness,ci. The labeler bot assigns these on PR open based on changed paths. Confirm the bot's choice (or override if wrong), then tick this.P0,P1,P2,P3. The priority-triage bot suggests one on PR open. Confirm or override, then tick this.Pending,Approved by @<reviewer>, orNot required: <reason>(default isPending; "not required" is restricted to bot-authored low-risk PRs).dev, and my PR title and commit messages use Conventional Commits in English.Summary by CodeRabbit
New Features
Documentation
Tests