fix: improve skill trigger routing accuracy across models by stack72 · Pull Request #1178 · systeminit/swamp

stack72 · 2026-04-14T21:26:58Z

Summary

Sharpen skill descriptions for swamp-report, swamp-extension-model, swamp-extension-driver, and swamp-model to reduce cross-model misrouting identified by multi-model eval runs
Add differentiating keywords (dataRepository, UnifiedDataRepository, dataHandles, MethodReportContext) to swamp-report so report API queries stop routing to troubleshooting
Narrow extension-model and extension-driver descriptions to creation-only scope with explicit exclusions for adjacent skills (swamp-model, swamp-workflow, swamp-troubleshooting, swamp-extension-publish)
Add workflow-orchestration exclusion to swamp-model
Strengthen eval system prompt against text-only responses from Opus and Gemini

Eval Results

Validated against multi-model eval suite (run 1, run 2):

Model	Before	After
Sonnet	99.0% (200/202)	97.5% (197/202)
GPT-5.4	98.0% (198/202)	98.0% (198/202)
Opus	94.1% (190/202)	96.5% (195/202)
Gemini	91.6% (185/202)	93.8% (120/128*)

*Gemini rate-limited, 128/202 tests completed.

Original cross-model failures from issue:

Failure	Before	After
report SHOULD trigger "UnifiedDataRepository"	4/4 fail	0/4 — FIXED
model NOT for "chain into workflow"	2/4 fail	0/4 — FIXED
extension-driver NOT for "Kubernetes cluster"	3/4 fail	0/4 — FIXED
extension-model NOT for "custom model in workflow"	3/4 fail	2/4 — improved
workflow NOT for "erroring on second step"	2/4 fail	2/4 — same

Closes #80

Test plan

🤖 Generated with Claude Code

Sharpen skill descriptions to reduce cross-model misrouting identified by multi-model eval runs. Adds explicit exclusions to extension-model, extension-driver, and model skills. Enriches report description with differentiating keywords. Strengthens eval system prompt against text-only responses. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Extension-model was incorrectly triggering on publishing queries like "Prepare my extension for publishing" and "Publish my model to the registry" after the initial description narrowing. Add explicit exclusion directing these to swamp-extension-publish. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions

Code Review

Blocking Issues

None.

Suggestions

Pre-existing body line limit: swamp-extension-model/SKILL.md is ~509 body lines (529 total minus ~20 lines of frontmatter), slightly exceeding the 500-line body limit in CLAUDE.md. This is pre-existing and not introduced by this PR, but worth noting for a future cleanup pass to split content into references/ files.

Summary

Clean, focused PR that sharpens skill frontmatter descriptions to reduce cross-model misrouting. The changes are well-structured:

Explicit "Do NOT use for X — that is Y" exclusions on swamp-extension-driver, swamp-extension-model, swamp-model, and swamp-report provide clear disambiguation guidance
Differentiating API keywords (dataRepository, UnifiedDataRepository, dataHandles, MethodReportContext) added to swamp-report to fix report routing failures
"remote execution" trigger correctly removed from swamp-extension-driver to prevent workflow misrouting
Eval system prompt reinforcement ("A text-only response with no tool call is ALWAYS wrong") is a good nudge for models that default to text responses
All YAML frontmatter uses valid > block scalars with correct name/description-only fields
CI passes: lint, test, format, skill review, and skill trigger eval all green
Eval results show clear improvement (Opus 94.1% → 96.5%, targeted failures fixed)

stack72 and others added 2 commits April 14, 2026 22:50

github-actions Bot approved these changes Apr 14, 2026

View reviewed changes

stack72 merged commit ebbbc00 into main Apr 14, 2026
15 checks passed

stack72 deleted the worktree-80 branch April 14, 2026 21:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: improve skill trigger routing accuracy across models#1178

fix: improve skill trigger routing accuracy across models#1178
stack72 merged 2 commits intomainfrom
worktree-80

stack72 commented Apr 14, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

stack72 commented Apr 14, 2026

Summary

Eval Results

Test plan

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Code Review

Blocking Issues

Suggestions

Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant