feat(harness): defer worktree tools behind tool_info activation by Astro-Han · Pull Request #1057 · Astro-Han/pawwork

Astro-Han · 2026-06-02T12:15:15Z

Summary

Implements the v1 tool-exposure mechanism from #1054: a tool_info tool that withholds low-frequency tools' full description and parameter schema from the default tool surface, advertises them as one-line cards, and lets the model load and activate one on demand. enter-worktree and exit-worktree are the two deferred tools for v1. tool_info(name) returns the tool's full description plus the same provider-transformed JSON schema the real call will use, then marks it activated. The registry hides deferred ids unless activated, filters them through the same permission / user.tools gate as any tool, and injects the live card list into tool_info's description each turn. The llm.ts repair path nudges a model that calls a not-yet-activated deferred tool toward tool_info.

Activation is derived from durable conversation history, not a parallel side-state: a completed tool_info(name=X) part means X is callable for the rest of that history, which mirrors provider-native tool search and stays consistent across retry / fork / compaction. The registry re-assembles the tool array each model step, so an activated tool becomes callable on the next step.

Why

#1054 asks for a deferred/activate exposure path so heavy, low-frequency tools can stop paying full description + schema cost on every turn. This PR proves the mechanism end to end on the two worktree tools, which are 0-invocation in historical samples so a discovery miss harms no real workflow. It is a mechanism spike, not a byte-reduction win: these two schemas are tiny (exit-worktree empty, enter-worktree ~548B), so withholding them roughly nets to zero once tool_info is added — the value is the validated loop that later, heavier tools can opt into. Provider-neutral by design: no Anthropic defer_loading, no OpenAI tool_search, no plugin-SDK extension.

Related Issue

Closes the v1 scope in #1054 (see the correction notes and alignment comment on that issue).

Human Review Status

Pending

Review Focus

The activation model: deriveActivatedTools reads completed tool_info parts from message history rather than tracking a parallel side-state — confirm that is the right durability boundary. And the permission parity: deferred tools pass through the same Permission / user.tools gate as any tool, and tool_info refuses a tool hidden this turn (via the deferredAvailable predicate threaded through ctx.extra) so the model is never told a disabled tool is active.

Risk Notes

Low blast radius: only enter-worktree / exit-worktree are deferred, both 0-invocation in historical samples, and their implementations are unchanged — only the exposure layer changes, so platform behavior is identical on macOS and Windows. tool_info is classified read-only in run-observability so it does not distort retry / incident-safety classification. Permission surface: deferred-tool visibility reuses the existing Permission / user.tools gate, no new permission path. Skipped conditional checklist items: visible UI / copy (no UI surface touched); macOS/Windows platform consideration (worktree tool implementations unchanged, only their exposure).

How To Verify

typecheck (tsgo --noEmit): clean
tool-info.test.ts: 7 passed (incl. tool_info rejecting an unavailable tool, and loading + activating an available one)
registry.test.ts: deferred tools hidden by default with both cards shown; surfaced after activation with the card dropped; hidden again with "no deferred tools" when permission-disabled — passed
run-observability suite: 68 passed
Manual: discoverability walked in the Electron app on a real task

Screenshots or Recordings

N/A — no visible UI change.

Checklist

How to use this checklist:

Tick a box by replacing [ ] with [x]. Do not edit, add, or remove items.

The bot-applied label items can only be honestly ticked AFTER the PR is opened and the labeler / priority-triage bots have run — return to the PR description and tick them then.

Most items are required. The few that are conditional are explicitly marked (conditional); for those, leave unticked if they truly do not apply and explain why in Risk Notes. All other items must be ticked before requesting human review.

Summary by CodeRabbit

New Features
- Introduced a tool discovery mechanism to dynamically activate optional tools.
- Tool availability now respects permission-based deferred rules.
- Tool activation state persists across session history.
Documentation
- Updated tool naming conventions for consistency.
Tests
- Added comprehensive coverage for deferred tool activation and discovery behavior.

Withhold enter-worktree/exit-worktree from the default tool surface. The model loads their full description + provider-transformed schema on demand via a new tool_info tool, which activates them on the next model step (registry.tools() re-assembles the array every step). Activation is derived from durable conversation history (completed tool_info calls), so it survives retry/fork/compaction without a parallel side state. tool_info's card list and the activation gate reuse the same permission/user.tools filtering as the real tool surface. A repair hint points the model at tool_info when it calls a still -deferred tool directly. v1 is a mechanism spike, not byte reduction: the two worktree tools are 0-invocation in samples (a sampling artifact, not lack of use) and net savings are ~0 once tool_info's own footprint is counted. Byte wins come when large tools are folded in later. See #1054 correction notes + alignment comment.

- Register tool_info as read-only in run-observability sanitize, so ordinary turns get a complete boundary snapshot and a harmless tool_info call is not recorded as an unsafe side effect (which could downgrade retry/incident-safety decisions). - tool_info now refuses a deferred tool hidden this turn by permission / user.tools (deferredAvailable threaded through ctx.extra), instead of telling the model a disabled tool is activated and making it burn turns calling an absent tool.

coderabbitai · 2026-06-02T12:15:23Z

Warning

Review limit reached

@Astro-Han, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 17 minutes and 46 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 3ecf3f95-014f-4caa-99e2-86b63b2771c7

📥 Commits

Reviewing files that changed from the base of the PR and between f2f273d and 12f04f6.

📒 Files selected for processing (5)

packages/opencode/src/session/message-v2.ts
packages/opencode/src/session/prompt.ts
packages/opencode/src/tool/tool-info.ts
packages/opencode/test/session/messages-pagination.test.ts
packages/opencode/test/tool/tool-info.test.ts

📝 Walkthrough

Walkthrough

This PR implements a deferred tool loading system that allows optional tools (enter-worktree, exit-worktree) to be loaded on-demand via a new tool_info tool. The system preserves tool activation state through session compaction, gates tool availability based on user permissions, and guides the LLM toward proper activation workflows via error hints.

Changes

Deferred Tool Loading System

Layer / File(s)	Summary
Deferred tool registry and helpers `packages/opencode/src/tool/tool-info.ts`, `packages/opencode/src/tool/enter-worktree.txt`, `packages/opencode/src/tool/exit-worktree.txt`	Deferred-tool registry (`enter-worktree`, `exit-worktree`) with metadata and lookup structures; activation derivation from message history; helpers for hints, reminders, and card lists; documentation standardized to lowercase kebab-case naming.
ToolInfoTool implementation `packages/opencode/src/tool/tool-info.ts`	Executable tool that canonicalizes deferred tool names, validates existence and per-context availability, activates tools via plugin definition, generates untruncated provider-specific schemas, and returns formatted output with metadata marking activation and preventing truncation.
Activation state preservation `packages/opencode/src/session/compaction.ts`, `packages/opencode/src/session/run-observability/sanitize.ts`	`PRUNE_PROTECTED_TOOLS` expanded to include `tool_info` so activation parts survive pruning; `TOOL_INFO` marked as read-only to ensure correct observability tracking.
Tool registry deferred support `packages/opencode/src/tool/registry.ts`	Registry extended with optional `activatedTools` and `deferredAvailable` inputs; `ToolInfoTool` wired into built-in tools; deferred tools conditionally filtered based on activation and availability; `tool_info` description dynamically built from available deferred tool cards.
Session activation tracking and reminders `packages/opencode/src/session/prompt.ts`	Main loop materializes full durable history once per iteration to preserve activation across compaction; derives activated and newly activated tools; computes deferred availability from merged permissions; appends activation reminders to user messages for newly activated tools; passes activation context through tool resolution.
LLM deferred tool error guidance `packages/opencode/src/session/llm.ts`	Computes deferred availability from merged permissions; appends deferred-tool hint to repair error messages when calling deferred tools directly, guiding the model toward proper activation workflow.
Comprehensive test coverage `packages/opencode/test/tool/tool-info.test.ts`, `packages/opencode/test/tool/registry.test.ts`, `packages/opencode/test/session/messages-pagination.test.ts`, `packages/opencode/test/tool/tool-define.test.ts`	Unit tests for activation derivation, canonicalization, and helper builders; integration tests for registry deferred visibility and untruncated schema output; session compaction tests validating activation survival; Tool.define error handling regression test.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

Astro-Han/pawwork#1089: Directly related to deferred-tool mechanism enhancements in tool-info and registry wiring.
Astro-Han/pawwork#1054: This PR implements the proposed deferred/direct tool exposure registry with activation-based visibility control.

Possibly related PRs

Astro-Han/pawwork#140: Both PRs modify session compaction/pruning behavior; this PR updates PRUNE_PROTECTED_TOOLS to preserve tool_info activation data while the related PR refactors the underlying compaction/tail-retention logic.

Poem

🐰 A rabbit hops through deferred dreams,
Worktrees enter-exit via tool_info's beams,
Activation survives the compaction's trim,
While hints guide the model on a whim.
No tool left behind, permission gates aligned, ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main change: deferring worktree tools behind a tool_info activation mechanism, which is the primary feature introduced.
Description check	✅ Passed	The description comprehensively covers all required sections: clear summary, rationale, issue linkage, human review status, review focus, risk notes with conditional items explained, and detailed verification results replacing the template example.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch claude/tool-exposure-1054

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions

Suggested priority: P2 (includes non-doc, non-test paths outside the low-risk bucket).

P1/P0 are reserved for maintainer confirmation. Please relabel manually if this is a release blocker, security issue, data-loss risk, or updater/runtime failure.

gemini-code-assist

Code Review

This pull request introduces a deferred tool loading mechanism to optimize the model's context window. Low-frequency tools, specifically enter-worktree and exit-worktree, are withheld from the initial tool surface and instead advertised as short cards within a new tool_info tool. The model can call tool_info to dynamically load a tool's full schema and description, activating it for subsequent steps. The feedback focuses on improving the robustness of this mechanism: first, by normalizing tool names to lowercase in deriveActivatedTools and error hints to handle casing mismatches gracefully; and second, by propagating operational errors in tool_info as typed failures using Effect.fail rather than throwing raw errors that trigger critical application defects via Effect.orDie.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

After tool_info(X) completes, small/medium models don't reflexively re-scan the next turn's tool array; they act on their prior impression of available tools and ignore the freshly exposed schema. mimo-v2.5 read the "now available" output, then in its reasoning said "from my tool list ... no enter-worktree" and fell back to `bash git worktree add`. The mechanism itself works (next-step registry really exposes X). The fix is to signal "X is callable" through a channel models actually attend to: a synthetic <system-reminder> in the user message of the step that follows activation, anchored on the existing SessionDiagnostics injection point so the reminder and the new tool land in the same model call. Anti-fallback hints are per-tool (enter-worktree's bash equivalent is explicitly ruled out, learned from the real session log). Also fixed: tool_info output line is sharper, and the worktree tool docs now use real kebab-case tool ids instead of CamelCase examples that would invite name hallucination.

P1 compaction survival: tool_info parts are in PRUNE_PROTECTED_TOOLS, so deriveActivatedTools keeps seeing activations after older tool outputs are pruned. The "derive from message history" design is honored by guaranteeing the history segment is durable. P1 plugin schema parity: tool_info runs the same plugin.trigger ("tool.definition") pipeline registry.tools() uses on every other tool, so the description and parameters the model reads match what the next-step tool list hands it. Injected as a closure to keep ToolInfoTool Plugin-free in its R. P2 Tool.define wrapping: tool_info is now a first-class Tool.define + Tool.init, gaining truncation, validation, span tracing, and uniform error formatting like every other builtin. Yielded at the outer make scope. P2 canonical hint: extracted buildDeferredHint so the repair-time message always names the kebab-case canonical id; a CamelCase echo like "Enter-Worktree" no longer becomes an invalid tool_info(name=...) suggestion. P3 single source of truth: id, card, description, and parameters for each deferred tool now live in one DEFERRED array. Adding a new deferred tool is a one-entry change instead of editing three places. Tests: PRUNE_PROTECTED_TOOLS membership and buildDeferredHint canonicalisation (CamelCase echo + non-deferred no-op). The previous in-process execute tests are removed because ToolInfoTool now requires the registry layer to instantiate; execute-path behavior is covered by registry.test.ts and end-to-end usage.

- P2-1: repair hint now gates on deferred availability; a disabled or permission-denied tool no longer routes the model to a tool_info call that would just fail again. - P2-2: add a request-level parity test asserting tool_info's loaded schema equals the post-activation tool schema (and stays untruncated). - P3-1: canonicalise the name inside tool_info so a CamelCase echo like "Enter-Worktree" resolves to enter-worktree instead of erroring. - P3-2: opt tool_info output out of truncation so a large deferred tool's schema is never clipped mid-load.

- deriveActivatedTools canonicalises the recorded raw input name, so a CamelCase echo like "Enter-Worktree" activates the tool on the next step instead of leaving the model pointed at a tool the registry never exposes (P2). - registry schema-parity test passes the session model through ctx.extra and compares against ProviderTransform.schema(model, ...), exercising the provider-transform branch the prior assertion silently skipped (P3).

… compacted view deriveActivatedTools was fed the compaction-filtered message list (msgs), so a tool_info(name=X) activation older than the retained tail was truncated away: the deferred tool dropped out of the next turn's tool list and silently re-locked mid-session. Materialise the full durable stream once per loop and feed it to activation derivation via a dedicated activationMessages param; the model-facing view stays the filtered msgs. Reuses the stream the loop already read (filterCompactedEffect + the lastUser/lastFinished scan), so it is one fewer DB pass, not an extra one. Regression test asserts activation survives compaction truncation: the filtered view loses it, the full history keeps it.

…al Error The Tool.define wrapper defectifies a generic Effect.fail(Error) (already covered), but the model is on the Effect.runPromise reject path, which squashes that Die back to the original Error via causeSquash — no FiberFailure wrapper. Assert the wrapped execution rejects with the original Error message, not a FiberFailure, so tool_info's unknown/unavailable paths (plain Effect.fail) stay clean operational errors rather than noisy alerts.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/opencode/src/session/prompt.ts`:
- Around line 2119-2134: deriveNewlyActivated currently inspects only the latest
assistant message (msgs) so reminders get lost if a compaction summary becomes
the latest assistant; update the logic in the block that injects activation
reminders (the loop that uses deriveNewlyActivated, lastUser,
PartID.ascending(), and buildActivationReminder) to derive newly activated tool
ids from durable history/allMessages (skipping assistant summary messages
produced by compaction) or alternatively propagate the set of just-activated
tool ids from resolveTools() across the compaction turn, then use that
durable/propagated set to create the synthetic user parts for lastUser so
reminders are always injected regardless of compaction.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: ad3f3805-9098-422c-b7ac-8642bee1e07c

📥 Commits

Reviewing files that changed from the base of the PR and between e6ff0cd and f2f273d.

📒 Files selected for processing (12)

packages/opencode/src/session/compaction.ts
packages/opencode/src/session/llm.ts
packages/opencode/src/session/prompt.ts
packages/opencode/src/session/run-observability/sanitize.ts
packages/opencode/src/tool/enter-worktree.txt
packages/opencode/src/tool/exit-worktree.txt
packages/opencode/src/tool/registry.ts
packages/opencode/src/tool/tool-info.ts
packages/opencode/test/session/messages-pagination.test.ts
packages/opencode/test/tool/registry.test.ts
packages/opencode/test/tool/tool-define.test.ts
packages/opencode/test/tool/tool-info.test.ts

…ry, not full hydration The previous fix materialised Array.from(stream(sessionID)) every loop to derive activation from the complete history. On long compacted sessions that regressed DB/parts hydration: filterCompacted and the lastUser/lastFinished scan both stop early at the compaction tail, but Array.from forced a full message+parts hydration of the entire history just to find a few tool_info parts. Replace it with MessageV2.toolInfoParts(sessionID): a single PartTable query filtered by json_extract($.tool)='tool_info' (same pattern as session.ts), built via the existing part() constructor. deriveActivatedToolsFromParts derives activation from those parts; the run loop goes back to the lazy early-breaking filterCompactedEffect + stream() it used before. Activation still spans the full durable history (reverted parts are physically deleted before the loop; forks use a fresh session id), so it stays correct across compaction without paying for full hydration. Regression test now asserts toolInfoParts spans compaction while the filtered model-facing view drops the activation.

…mary assistant The one-shot activation system-reminder was derived via deriveNewlyActivated(msgs) over the compaction-filtered view. If an auto-compaction lands between a tool_info activation and the next step and summarises the activating turn into the head (older than the retained tail), filterCompacted drops that turn and the newest assistant becomes the compaction summary (no tool_info part), so the reminder was silently lost. Activation itself was unaffected (it derives from durable tool_info parts); only the reminder. Add MessageV2.lastNonSummaryAssistant(sessionID): a durable single-row query (json_extract excludes summary assistants) for the newest real assistant turn. deriveNewlyActivated now takes that single message. Summaries are excluded at the source so a compaction cannot suppress the reminder; one-shot semantics hold because any real non-tool_info turn becomes the newest non-summary assistant and clears the set. Regression test: a compaction summarising the activating turn into the head still yields the activation from lastNonSummaryAssistant, while the filtered view newest assistant is the summary.

Astro-Han added 2 commits June 2, 2026 19:55

Astro-Han added the enhancement New feature or request label Jun 2, 2026

github-actions Bot added harness Model harness, prompts, tool descriptions, and session mechanics P2 Medium priority labels Jun 2, 2026

github-actions Bot reviewed Jun 2, 2026

View reviewed changes

Astro-Han added P1 High priority and removed P2 Medium priority labels Jun 2, 2026

gemini-code-assist Bot reviewed Jun 2, 2026

View reviewed changes

Comment thread packages/opencode/src/tool/tool-info.ts

Comment thread packages/opencode/src/session/llm.ts Outdated

Comment thread packages/opencode/src/tool/tool-info.ts Outdated

Astro-Han added 5 commits June 2, 2026 23:02

Merge remote-tracking branch 'origin/dev' into claude/tool-exposure-1054

1acec06

Astro-Han mentioned this pull request Jun 3, 2026

[Feature] Tool-exposure v1 instrumentation: deferred card/activation/cache metrics (follow-up to #1054) #1089

Open

Astro-Han added 2 commits June 3, 2026 09:29

coderabbitai Bot reviewed Jun 3, 2026

View reviewed changes

Comment thread packages/opencode/src/session/prompt.ts

Astro-Han added 2 commits June 3, 2026 10:02

Astro-Han merged commit 74af1a3 into dev Jun 3, 2026
33 checks passed

Astro-Han deleted the claude/tool-exposure-1054 branch June 3, 2026 04:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(harness): defer worktree tools behind tool_info activation#1057

feat(harness): defer worktree tools behind tool_info activation#1057
Astro-Han merged 11 commits into
devfrom
claude/tool-exposure-1054

Astro-Han commented Jun 2, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 2, 2026 •

edited

Loading

Review limit reached

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot left a comment

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Astro-Han commented Jun 2, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Related Issue

Human Review Status

Review Focus

Risk Notes

How To Verify

Screenshots or Recordings

Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Astro-Han commented Jun 2, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 2, 2026 •

edited

Loading