Skip to content

skill: add --extract-model opt-in for Step 3B subagent model#289

Open
yelban wants to merge 1 commit intosafishamsi:v4from
yelban:feature/extract-model-routing
Open

skill: add --extract-model opt-in for Step 3B subagent model#289
yelban wants to merge 1 commit intosafishamsi:v4from
yelban:feature/extract-model-routing

Conversation

@yelban
Copy link
Copy Markdown

@yelban yelban commented Apr 13, 2026

Summary

Step 3B dispatches one semantic-extraction subagent per chunk. Today those subagents inherit the parent Claude Code model, which is often Opus — overkill for the portion of the work that is simple pattern matching. Running Opus over a code-heavy corpus wastes budget; running Sonnet under `--mode deep` sacrifices the aggressive-INFERRED quality that mode is asking for.

This PR adds an opt-in flag so users who want finer control can pin the Step 3B model:

```
--extract-model sonnet|opus|auto
```

No default behavior change. Flag omitted → subagents inherit parent exactly as today.

  • `sonnet` / `opus` → pin all Step 3B subagents to that model.
  • `auto` → corpus-shape heuristic:
    • `--mode deep` → opus (aggressive INFERRED; reasoning pays off)
    • `code_files / total > 0.8` → sonnet (AST carries it; semantic layer is thin)
    • `(docs + papers) / total > 0.3` → opus (cross-document semantic edges are the payoff)
    • otherwise → sonnet

When the flag is set, `skill.md` instructs the orchestrator to pass `model=EXTRACT_MODEL` on every Agent call and print one line naming the resolved model and reason. When omitted, output is byte-identical to pre-flag graphify.

Scope

  • Single file touched in the `graphify/` package: `graphify/skill.md` (Claude Code variant).
  • Other `skill-*.md` variants (codex, aider, trae, etc.) are intentionally not changed — the `model` parameter on a dispatched subagent is a Claude-Code-specific mechanism; other CLIs use different model-selection surfaces.
  • No Python package changes. `detect.json` already contains every field the heuristic needs.
  • Full design rationale, threshold reasoning, considered alternatives, and test plan in `graphify/docs/extract-model-routing.md` (added by this PR).

Compatibility

  • Cache is keyed on file content hash, not model → switching `--extract-model` between runs does not invalidate existing cache.
  • No schema change to `graph.json`, `GRAPH_REPORT.md`, or any output.
  • Rollback: revert `graphify/skill.md`; no migration, no persistent state.

Test plan

  • Small mixed corpus (~50 files, 60% code + 40% docs)
    • No flag → no `Extract model:` line printed; graph identical to known-good prior run
    • `--extract-model auto` → prints `auto-doc-heavy`, resolves to `opus`
    • `--extract-model sonnet` → prints `manual`, dispatches with `model="sonnet"`
  • Code-only corpus (~50 files, 100% code)
    • `--extract-model auto` → resolves to `sonnet` (`auto-code-heavy`)
  • `--mode deep --extract-model auto` → resolves to `opus` (`auto-deep`) regardless of corpus shape
  • `graph.json` INFERRED edges still present under Sonnet; `confidence_score` distribution reasonable (not collapsed to 0.5)

Step 3B dispatches one semantic-extraction subagent per chunk. Today
those subagents inherit the parent Claude Code model, which is often
Opus. Extraction mixes cheap pattern-matching (EXTRACTED edges, schema
compliance) with reasoning-heavy judgment (INFERRED edges,
semantically_similar_to, hyperedges, confidence calibration). Sonnet
handles the former with near-parity but cedes quality on the latter;
running Opus over a code-heavy corpus wastes budget.

Add a new opt-in flag:

    --extract-model sonnet|opus|auto

- Flag omitted: no behavior change; subagents inherit parent as before.
- sonnet / opus: pin all Step 3B subagents to that model.
- auto: corpus-shape heuristic — --mode deep -> opus; code_ratio > 0.8
  -> sonnet; (docs+papers)/total > 0.3 -> opus; else sonnet.

When the flag is set, skill.md instructs the orchestrator to pass
model=EXTRACT_MODEL on every Agent tool call and print one line naming
the resolved model and reason. When omitted, output is byte-identical
to pre-flag graphify.

Scope: only skill.md (Claude Code variant). Other skill-*.md variants
use different model-selection mechanisms and are intentionally
untouched. No Python package changes; detect.json already has the
fields the heuristic needs. No schema or cache change.

Full rationale, threshold reasoning, considered alternatives, and test
plan in graphify/docs/extract-model-routing.md.
@3esmit
Copy link
Copy Markdown

3esmit commented Apr 13, 2026

If you are altering skill.md (claude code), you should also alter other flavors skill.md, unless this is a claude code only feature.

@yelban
Copy link
Copy Markdown
Author

yelban commented Apr 13, 2026

Thanks for flagging — that's a fair question, and the PR body's Scope section addresses it explicitly (toggle "Scope" above). Short version: this is Claude-Code-only because the mechanism is Claude-Code-only.

The flag's implementation pins the model on a dispatched subagent via the model= parameter on the Agent tool. That parameter is specific to Claude Code's Task/Agent dispatch surface. Other flavors don't dispatch subagents the same way:

  • Codex / OpenCode / Aider / Droid / Trae / Claw / Copilot: each CLI has its own model-selection mechanism (e.g. --model at the top level, a config file, an env var). None of them accept a per-dispatched-subagent model argument in the same shape.
  • Porting the concept (opt-in model pinning at Step 3B) to each flavor is feasible but requires researching each CLI's equivalent surface and testing per-flavor. That's a separate, per-flavor effort, not a copy-paste of this diff.

Happy to open follow-up PRs for any specific flavor the maintainer wants me to target — but I'd rather do that on evidence of the mechanism than blindly extrapolate. Leaving the scoping call to @safishamsi.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants