Skip to content

feat(harness): defer worktree tools behind tool_info activation#1057

Merged
Astro-Han merged 11 commits into
devfrom
claude/tool-exposure-1054
Jun 3, 2026
Merged

feat(harness): defer worktree tools behind tool_info activation#1057
Astro-Han merged 11 commits into
devfrom
claude/tool-exposure-1054

Conversation

@Astro-Han
Copy link
Copy Markdown
Owner

@Astro-Han Astro-Han commented Jun 2, 2026

Summary

Implements the v1 tool-exposure mechanism from #1054: a tool_info tool that withholds low-frequency tools' full description and parameter schema from the default tool surface, advertises them as one-line cards, and lets the model load and activate one on demand. enter-worktree and exit-worktree are the two deferred tools for v1. tool_info(name) returns the tool's full description plus the same provider-transformed JSON schema the real call will use, then marks it activated. The registry hides deferred ids unless activated, filters them through the same permission / user.tools gate as any tool, and injects the live card list into tool_info's description each turn. The llm.ts repair path nudges a model that calls a not-yet-activated deferred tool toward tool_info.

Activation is derived from durable conversation history, not a parallel side-state: a completed tool_info(name=X) part means X is callable for the rest of that history, which mirrors provider-native tool search and stays consistent across retry / fork / compaction. The registry re-assembles the tool array each model step, so an activated tool becomes callable on the next step.

Why

#1054 asks for a deferred/activate exposure path so heavy, low-frequency tools can stop paying full description + schema cost on every turn. This PR proves the mechanism end to end on the two worktree tools, which are 0-invocation in historical samples so a discovery miss harms no real workflow. It is a mechanism spike, not a byte-reduction win: these two schemas are tiny (exit-worktree empty, enter-worktree ~548B), so withholding them roughly nets to zero once tool_info is added — the value is the validated loop that later, heavier tools can opt into. Provider-neutral by design: no Anthropic defer_loading, no OpenAI tool_search, no plugin-SDK extension.

Related Issue

Closes the v1 scope in #1054 (see the correction notes and alignment comment on that issue).

Human Review Status

Pending

Review Focus

The activation model: deriveActivatedTools reads completed tool_info parts from message history rather than tracking a parallel side-state — confirm that is the right durability boundary. And the permission parity: deferred tools pass through the same Permission / user.tools gate as any tool, and tool_info refuses a tool hidden this turn (via the deferredAvailable predicate threaded through ctx.extra) so the model is never told a disabled tool is active.

Risk Notes

Low blast radius: only enter-worktree / exit-worktree are deferred, both 0-invocation in historical samples, and their implementations are unchanged — only the exposure layer changes, so platform behavior is identical on macOS and Windows. tool_info is classified read-only in run-observability so it does not distort retry / incident-safety classification. Permission surface: deferred-tool visibility reuses the existing Permission / user.tools gate, no new permission path. Skipped conditional checklist items: visible UI / copy (no UI surface touched); macOS/Windows platform consideration (worktree tool implementations unchanged, only their exposure).

How To Verify

typecheck (tsgo --noEmit): clean
tool-info.test.ts: 7 passed (incl. tool_info rejecting an unavailable tool, and loading + activating an available one)
registry.test.ts: deferred tools hidden by default with both cards shown; surfaced after activation with the card dropped; hidden again with "no deferred tools" when permission-disabled — passed
run-observability suite: 68 passed
Manual: discoverability walked in the Electron app on a real task

Screenshots or Recordings

N/A — no visible UI change.

Checklist

How to use this checklist:

  • Tick a box by replacing [ ] with [x]. Do not edit, add, or remove items.
  • The bot-applied label items can only be honestly ticked AFTER the PR is opened and the labeler / priority-triage bots have run — return to the PR description and tick them then.
  • Most items are required. The few that are conditional are explicitly marked (conditional); for those, leave unticked if they truly do not apply and explain why in Risk Notes. All other items must be ticked before requesting human review.
  • Type label — this PR carries exactly one of bug, enhancement, task, documentation. Type labels are author-added; the labeler bot does NOT assign them. Add the label in the GitHub UI, then tick this.
  • Routing labels — this PR carries at least one of app, ui, platform, harness, ci. The labeler bot assigns these on PR open based on changed paths. Confirm the bot's choice (or override if wrong), then tick this.
  • Priority label — this PR carries exactly one of P0, P1, P2, P3. The priority-triage bot suggests one on PR open. Confirm or override, then tick this.
  • Human Review Status above is set to Pending, Approved by @<reviewer>, or Not required: <reason> (default is Pending; "not required" is restricted to bot-authored low-risk PRs).
  • I linked the related issue, or stated in Summary why there is no issue.
  • I described the review focus and any meaningful risks.
  • I replaced the example block in How To Verify with the real verification steps and the key result for each.
  • I did not introduce unrelated refactors, dependencies, generated files, or file changes beyond the stated scope.
  • (conditional) I manually checked visible UI or copy changes when needed, with screenshots or recordings. Leave unticked only if no visible UI or copy changed.
  • (conditional) I considered macOS and Windows impact for platform, packaging, updater, signing, paths, shell, or permissions changes. Leave unticked only if no platform/packaging surface was touched.
  • (conditional) I called out docs, release notes, dependencies, permissions, credentials, deletion behavior, generated content, or local file changes when relevant. Leave unticked only if none of those surfaces was touched.
  • I reviewed the final diff for unrelated changes and suspicious dependency changes.
  • I am targeting dev, and my PR title and commit messages use Conventional Commits in English.

Summary by CodeRabbit

  • New Features

    • Introduced a tool discovery mechanism to dynamically activate optional tools.
    • Tool availability now respects permission-based deferred rules.
    • Tool activation state persists across session history.
  • Documentation

    • Updated tool naming conventions for consistency.
  • Tests

    • Added comprehensive coverage for deferred tool activation and discovery behavior.

Astro-Han added 2 commits June 2, 2026 19:55
Withhold enter-worktree/exit-worktree from the default tool surface. The model
loads their full description + provider-transformed schema on demand via a new
tool_info tool, which activates them on the next model step (registry.tools()
re-assembles the array every step). Activation is derived from durable
conversation history (completed tool_info calls), so it survives
retry/fork/compaction without a parallel side state. tool_info's card list and
the activation gate reuse the same permission/user.tools filtering as the real
tool surface. A repair hint points the model at tool_info when it calls a still
-deferred tool directly.

v1 is a mechanism spike, not byte reduction: the two worktree tools are
0-invocation in samples (a sampling artifact, not lack of use) and net savings
are ~0 once tool_info's own footprint is counted. Byte wins come when large
tools are folded in later. See #1054 correction notes + alignment comment.
- Register tool_info as read-only in run-observability sanitize, so ordinary
  turns get a complete boundary snapshot and a harmless tool_info call is not
  recorded as an unsafe side effect (which could downgrade retry/incident-safety
  decisions).
- tool_info now refuses a deferred tool hidden this turn by permission /
  user.tools (deferredAvailable threaded through ctx.extra), instead of telling
  the model a disabled tool is activated and making it burn turns calling an
  absent tool.
@Astro-Han Astro-Han added the enhancement New feature or request label Jun 2, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 2, 2026

Review Change Stack

Warning

Review limit reached

@Astro-Han, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 17 minutes and 46 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 3ecf3f95-014f-4caa-99e2-86b63b2771c7

📥 Commits

Reviewing files that changed from the base of the PR and between f2f273d and 12f04f6.

📒 Files selected for processing (5)
  • packages/opencode/src/session/message-v2.ts
  • packages/opencode/src/session/prompt.ts
  • packages/opencode/src/tool/tool-info.ts
  • packages/opencode/test/session/messages-pagination.test.ts
  • packages/opencode/test/tool/tool-info.test.ts
📝 Walkthrough

Walkthrough

This PR implements a deferred tool loading system that allows optional tools (enter-worktree, exit-worktree) to be loaded on-demand via a new tool_info tool. The system preserves tool activation state through session compaction, gates tool availability based on user permissions, and guides the LLM toward proper activation workflows via error hints.

Changes

Deferred Tool Loading System

Layer / File(s) Summary
Deferred tool registry and helpers
packages/opencode/src/tool/tool-info.ts, packages/opencode/src/tool/enter-worktree.txt, packages/opencode/src/tool/exit-worktree.txt
Deferred-tool registry (enter-worktree, exit-worktree) with metadata and lookup structures; activation derivation from message history; helpers for hints, reminders, and card lists; documentation standardized to lowercase kebab-case naming.
ToolInfoTool implementation
packages/opencode/src/tool/tool-info.ts
Executable tool that canonicalizes deferred tool names, validates existence and per-context availability, activates tools via plugin definition, generates untruncated provider-specific schemas, and returns formatted output with metadata marking activation and preventing truncation.
Activation state preservation
packages/opencode/src/session/compaction.ts, packages/opencode/src/session/run-observability/sanitize.ts
PRUNE_PROTECTED_TOOLS expanded to include tool_info so activation parts survive pruning; TOOL_INFO marked as read-only to ensure correct observability tracking.
Tool registry deferred support
packages/opencode/src/tool/registry.ts
Registry extended with optional activatedTools and deferredAvailable inputs; ToolInfoTool wired into built-in tools; deferred tools conditionally filtered based on activation and availability; tool_info description dynamically built from available deferred tool cards.
Session activation tracking and reminders
packages/opencode/src/session/prompt.ts
Main loop materializes full durable history once per iteration to preserve activation across compaction; derives activated and newly activated tools; computes deferred availability from merged permissions; appends activation reminders to user messages for newly activated tools; passes activation context through tool resolution.
LLM deferred tool error guidance
packages/opencode/src/session/llm.ts
Computes deferred availability from merged permissions; appends deferred-tool hint to repair error messages when calling deferred tools directly, guiding the model toward proper activation workflow.
Comprehensive test coverage
packages/opencode/test/tool/tool-info.test.ts, packages/opencode/test/tool/registry.test.ts, packages/opencode/test/session/messages-pagination.test.ts, packages/opencode/test/tool/tool-define.test.ts
Unit tests for activation derivation, canonicalization, and helper builders; integration tests for registry deferred visibility and untruncated schema output; session compaction tests validating activation survival; Tool.define error handling regression test.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

  • Astro-Han/pawwork#1089: Directly related to deferred-tool mechanism enhancements in tool-info and registry wiring.
  • Astro-Han/pawwork#1054: This PR implements the proposed deferred/direct tool exposure registry with activation-based visibility control.

Possibly related PRs

  • Astro-Han/pawwork#140: Both PRs modify session compaction/pruning behavior; this PR updates PRUNE_PROTECTED_TOOLS to preserve tool_info activation data while the related PR refactors the underlying compaction/tail-retention logic.

Poem

🐰 A rabbit hops through deferred dreams,
Worktrees enter-exit via tool_info's beams,
Activation survives the compaction's trim,
While hints guide the model on a whim.
No tool left behind, permission gates aligned,

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: deferring worktree tools behind a tool_info activation mechanism, which is the primary feature introduced.
Description check ✅ Passed The description comprehensively covers all required sections: clear summary, rationale, issue linkage, human review status, review focus, risk notes with conditional items explained, and detailed verification results replacing the template example.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/tool-exposure-1054

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added harness Model harness, prompts, tool descriptions, and session mechanics P2 Medium priority labels Jun 2, 2026
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested priority: P2 (includes non-doc, non-test paths outside the low-risk bucket).

P1/P0 are reserved for maintainer confirmation. Please relabel manually if this is a release blocker, security issue, data-loss risk, or updater/runtime failure.

@Astro-Han Astro-Han added P1 High priority and removed P2 Medium priority labels Jun 2, 2026
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a deferred tool loading mechanism to optimize the model's context window. Low-frequency tools, specifically enter-worktree and exit-worktree, are withheld from the initial tool surface and instead advertised as short cards within a new tool_info tool. The model can call tool_info to dynamically load a tool's full schema and description, activating it for subsequent steps. The feedback focuses on improving the robustness of this mechanism: first, by normalizing tool names to lowercase in deriveActivatedTools and error hints to handle casing mismatches gracefully; and second, by propagating operational errors in tool_info as typed failures using Effect.fail rather than throwing raw errors that trigger critical application defects via Effect.orDie.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread packages/opencode/src/tool/tool-info.ts
Comment thread packages/opencode/src/session/llm.ts Outdated
Comment thread packages/opencode/src/tool/tool-info.ts Outdated
Astro-Han added 5 commits June 2, 2026 23:02
After tool_info(X) completes, small/medium models don't reflexively re-scan
the next turn's tool array; they act on their prior impression of available
tools and ignore the freshly exposed schema. mimo-v2.5 read the "now available"
output, then in its reasoning said "from my tool list ... no enter-worktree"
and fell back to `bash git worktree add`.

The mechanism itself works (next-step registry really exposes X). The fix is
to signal "X is callable" through a channel models actually attend to: a
synthetic <system-reminder> in the user message of the step that follows
activation, anchored on the existing SessionDiagnostics injection point so
the reminder and the new tool land in the same model call.

Anti-fallback hints are per-tool (enter-worktree's bash equivalent is explicitly
ruled out, learned from the real session log). Also fixed: tool_info output
line is sharper, and the worktree tool docs now use real kebab-case tool ids
instead of CamelCase examples that would invite name hallucination.
P1 compaction survival: tool_info parts are in PRUNE_PROTECTED_TOOLS, so
deriveActivatedTools keeps seeing activations after older tool outputs are
pruned. The "derive from message history" design is honored by guaranteeing
the history segment is durable.

P1 plugin schema parity: tool_info runs the same plugin.trigger
("tool.definition") pipeline registry.tools() uses on every other tool, so the
description and parameters the model reads match what the next-step tool list
hands it. Injected as a closure to keep ToolInfoTool Plugin-free in its R.

P2 Tool.define wrapping: tool_info is now a first-class Tool.define + Tool.init,
gaining truncation, validation, span tracing, and uniform error formatting
like every other builtin. Yielded at the outer make scope.

P2 canonical hint: extracted buildDeferredHint so the repair-time message
always names the kebab-case canonical id; a CamelCase echo like "Enter-Worktree"
no longer becomes an invalid tool_info(name=...) suggestion.

P3 single source of truth: id, card, description, and parameters for each
deferred tool now live in one DEFERRED array. Adding a new deferred tool is
a one-entry change instead of editing three places.

Tests: PRUNE_PROTECTED_TOOLS membership and buildDeferredHint canonicalisation
(CamelCase echo + non-deferred no-op). The previous in-process execute tests
are removed because ToolInfoTool now requires the registry layer to instantiate;
execute-path behavior is covered by registry.test.ts and end-to-end usage.
- P2-1: repair hint now gates on deferred availability; a disabled or
  permission-denied tool no longer routes the model to a tool_info call
  that would just fail again.
- P2-2: add a request-level parity test asserting tool_info's loaded
  schema equals the post-activation tool schema (and stays untruncated).
- P3-1: canonicalise the name inside tool_info so a CamelCase echo like
  "Enter-Worktree" resolves to enter-worktree instead of erroring.
- P3-2: opt tool_info output out of truncation so a large deferred tool's
  schema is never clipped mid-load.
- deriveActivatedTools canonicalises the recorded raw input name, so a CamelCase
  echo like "Enter-Worktree" activates the tool on the next step instead of
  leaving the model pointed at a tool the registry never exposes (P2).
- registry schema-parity test passes the session model through ctx.extra and
  compares against ProviderTransform.schema(model, ...), exercising the
  provider-transform branch the prior assertion silently skipped (P3).
Astro-Han added 2 commits June 3, 2026 09:29
… compacted view

deriveActivatedTools was fed the compaction-filtered message list (msgs), so a tool_info(name=X) activation older than the retained tail was truncated away: the deferred tool dropped out of the next turn's tool list and silently re-locked mid-session.

Materialise the full durable stream once per loop and feed it to activation derivation via a dedicated activationMessages param; the model-facing view stays the filtered msgs. Reuses the stream the loop already read (filterCompactedEffect + the lastUser/lastFinished scan), so it is one fewer DB pass, not an extra one.

Regression test asserts activation survives compaction truncation: the filtered view loses it, the full history keeps it.
…al Error

The Tool.define wrapper defectifies a generic Effect.fail(Error) (already covered), but the model is on the Effect.runPromise reject path, which squashes that Die back to the original Error via causeSquash — no FiberFailure wrapper. Assert the wrapped execution rejects with the original Error message, not a FiberFailure, so tool_info's unknown/unavailable paths (plain Effect.fail) stay clean operational errors rather than noisy alerts.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/opencode/src/session/prompt.ts`:
- Around line 2119-2134: deriveNewlyActivated currently inspects only the latest
assistant message (msgs) so reminders get lost if a compaction summary becomes
the latest assistant; update the logic in the block that injects activation
reminders (the loop that uses deriveNewlyActivated, lastUser,
PartID.ascending(), and buildActivationReminder) to derive newly activated tool
ids from durable history/allMessages (skipping assistant summary messages
produced by compaction) or alternatively propagate the set of just-activated
tool ids from resolveTools() across the compaction turn, then use that
durable/propagated set to create the synthetic user parts for lastUser so
reminders are always injected regardless of compaction.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: ad3f3805-9098-422c-b7ac-8642bee1e07c

📥 Commits

Reviewing files that changed from the base of the PR and between e6ff0cd and f2f273d.

📒 Files selected for processing (12)
  • packages/opencode/src/session/compaction.ts
  • packages/opencode/src/session/llm.ts
  • packages/opencode/src/session/prompt.ts
  • packages/opencode/src/session/run-observability/sanitize.ts
  • packages/opencode/src/tool/enter-worktree.txt
  • packages/opencode/src/tool/exit-worktree.txt
  • packages/opencode/src/tool/registry.ts
  • packages/opencode/src/tool/tool-info.ts
  • packages/opencode/test/session/messages-pagination.test.ts
  • packages/opencode/test/tool/registry.test.ts
  • packages/opencode/test/tool/tool-define.test.ts
  • packages/opencode/test/tool/tool-info.test.ts

Comment thread packages/opencode/src/session/prompt.ts
Astro-Han added 2 commits June 3, 2026 10:02
…ry, not full hydration

The previous fix materialised Array.from(stream(sessionID)) every loop to derive activation from the complete history. On long compacted sessions that regressed DB/parts hydration: filterCompacted and the lastUser/lastFinished scan both stop early at the compaction tail, but Array.from forced a full message+parts hydration of the entire history just to find a few tool_info parts.

Replace it with MessageV2.toolInfoParts(sessionID): a single PartTable query filtered by json_extract($.tool)='tool_info' (same pattern as session.ts), built via the existing part() constructor. deriveActivatedToolsFromParts derives activation from those parts; the run loop goes back to the lazy early-breaking filterCompactedEffect + stream() it used before. Activation still spans the full durable history (reverted parts are physically deleted before the loop; forks use a fresh session id), so it stays correct across compaction without paying for full hydration.

Regression test now asserts toolInfoParts spans compaction while the filtered model-facing view drops the activation.
…mary assistant

The one-shot activation system-reminder was derived via deriveNewlyActivated(msgs) over the compaction-filtered view. If an auto-compaction lands between a tool_info activation and the next step and summarises the activating turn into the head (older than the retained tail), filterCompacted drops that turn and the newest assistant becomes the compaction summary (no tool_info part), so the reminder was silently lost. Activation itself was unaffected (it derives from durable tool_info parts); only the reminder.

Add MessageV2.lastNonSummaryAssistant(sessionID): a durable single-row query (json_extract excludes summary assistants) for the newest real assistant turn. deriveNewlyActivated now takes that single message. Summaries are excluded at the source so a compaction cannot suppress the reminder; one-shot semantics hold because any real non-tool_info turn becomes the newest non-summary assistant and clears the set.

Regression test: a compaction summarising the activating turn into the head still yields the activation from lastNonSummaryAssistant, while the filtered view newest assistant is the summary.
@Astro-Han Astro-Han merged commit 74af1a3 into dev Jun 3, 2026
33 checks passed
@Astro-Han Astro-Han deleted the claude/tool-exposure-1054 branch June 3, 2026 04:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request harness Model harness, prompts, tool descriptions, and session mechanics P1 High priority

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant