skill-validator: restore 15K aggregate cap as the real Copilot CLI skill-menu budget#803
Conversation
…opilot CLI skill-menu budget The per-plugin aggregate description cap had been raised 15,000 -> 20,000 -> 22,000 under the belief that 15K was 'a local repo policy, NOT a documented Copilot constraint'. That belief was wrong: the GitHub Copilot CLI renders the model-facing <available_skills> menu under a hard 15,000- char budget (the agent SDK's SKILL_CHAR_BUDGET, default 15e3, confirmed in CLI 1.0.36 and 1.0.61). Skills are listed alphabetically and emitted with their full <description> only until the budget is exhausted; every skill past the cut-off collapses to a bare name with no description and can no longer be reliably model-activated. Raising the validator cap merely masked this silent menu truncation — e.g. dotnet-test's run-tests and test-* skills stopped activating in plugin eval runs because they fell into the name-only overflow. Changes: - SkillProfiler.MaxAggregateDescriptionLength: 22,000 -> 15,000, with the comment rewritten to document the real Copilot CLI budget (and correct the prior 'not a documented constraint' claim). - CheckCommand aggregate now excludes skills marked 'disable-model-invocation: true' — the CLI drops those from the menu, so they do not consume the budget. This makes the cap satisfiable by hiding reference / agent-orchestrated primitives rather than only by trimming. - InvestigatingResults.md: document plugin-arm-only non-activation caused by skill-menu budget overflow, and how to fix it. Note: dotnet-test currently exceeds 15K and must be slimmed below it (via disable-model-invocation on reference/primitive skills plus description trims) before this cap can go green repo-wide. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Note This PR is from a fork and modifies infrastructure files ( Changes to infrastructure typically need to be submitted from a branch in Please consider recreating this PR from an upstream branch. If you don't have push access to |
There was a problem hiding this comment.
Pull request overview
Aligns skill-validator’s per-plugin aggregate description cap with the Copilot CLI’s effective 15,000-character skill-menu budget, and updates validation/docs to prevent silent <available_skills> truncation from masking plugin-arm non-activation.
Changes:
- Restores
SkillProfiler.MaxAggregateDescriptionLengthto 15,000 and rewrites the rationale/commentary to reflect the Copilot CLI menu budget behavior. - Updates
checkto exclude skills withdisable-model-invocation: truefrom the aggregate description total (matching CLI menu behavior). - Documents “plugin-arm-only non-activation due to menu overflow” troubleshooting steps in
InvestigatingResults.md.
Show a summary per file
| File | Description |
|---|---|
| eng/skill-validator/src/docs/InvestigatingResults.md | Adds guidance for diagnosing plugin-only non-activation caused by Copilot CLI skill-menu budget overflow and suggests mitigations. |
| eng/skill-validator/src/Check/SkillProfiler.cs | Lowers the aggregate description cap to 15,000 and documents it as a Copilot CLI budget constraint. |
| eng/skill-validator/src/Check/CheckCommand.cs | Excludes disable-model-invocation: true skills from the aggregate description calculation during plugin checks. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 3/3 changed files
- Comments generated: 2
|
@Evangelink : Can we re-create this from a branch in the repo please? |
…ion check Address review: replace Regex.IsMatch(pattern-string) with a [GeneratedRegex] partial method (AOT-friendly, no per-call cache lookup), matching FrontmatterParser's style. Runs once per skill during checks. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
👋 @Evangelink — this PR has 2 unresolved review thread(s). When you're ready, please address the feedback and push an update; the triage bot will pick up the next state automatically. (Add the |
Skill Validation Results
[1] (Isolated) Quality improved but weighted score is -2.1% due to: tokens (65782 → 85343), tool calls (5 → 6) Model: claude-opus-4.6 | Judge: claude-opus-4.6 🔍 Full Results - additional metrics and failure investigation steps
▶ Sessions Visualisation -- interactive replay of all evaluation sessions |
|
✅ Approved by @AbhitejJohn. cc @dotnet/skills-merge-approvers — ready to merge. |
…ck-scalar false positives The regex-based check matched any line in the frontmatter, so a block-scalar description that merely mentioned 'disable-model-invocation: true' on its own line was wrongly treated as disabling model invocation. Parse the frontmatter with the existing YAML deserializer (which correctly handles block scalars) by adding a DisableModelInvocation field to SkillFrontmatter, and drop the regex entirely. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
/evaluate |
…killMenuLength The constant now caps the fully-rendered <skill>-menu size (name + description + location + markup) rather than the sum of raw description lengths, so the old name was misleading and risked someone reverting to summing Description.Length. Rename it (and update related comments/docs) to reflect what is actually enforced. Also fix the test-name grammar DescriptionsSummingToLimit_Fail -> _Fails. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Skill Validation Results
[1] (Plugin) Quality unchanged but weighted score is -2.9% due to: tokens (12709 → 17382), time (6.3s → 8.9s)
Model: claude-opus-4.6 | Judge: claude-opus-4.6 🔍 Full Results - additional metrics and failure investigation steps
▶ Sessions Visualisation -- interactive replay of all evaluation sessions |
…ptions under 15K rendered menu budget Merge the polyglot find-untested-sources-polyglot skill into find-untested-sources as a single model-invocable skill documenting both engines (Roslyn for C#/.NET, tree-sitter for polyglot), restoring discoverability that was lost when both were hidden via disable-model-invocation to fit the budget. Trim keyword-stuffed descriptions on coverage-analysis, test-anti-patterns, test-tagging, assertion-quality, grade-tests, and migrate-static-to-wrapper. Rendered skill-menu size for dotnet-test drops to 14,722 chars (278 under the 15,000 cap enforced by skill-validator on PR dotnet#803). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…x stale cap comment Address review feedback: - Plumb DisableModelInvocation through SkillInfo (populated once in SkillDiscovery.DiscoverSkillAt) instead of re-deserializing each skill's YAML frontmatter in CheckCommand. Removes the per-skill double parse. - Correct the SkillProfiler comment that still called the constant an 'aggregate description size cap'; it enforces a rendered skill-menu budget. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Pushed two commits ( 1. dotnet-test slimmed under the cap — resolves the
2. Addressed the two open review comments — plumbed
|
|
/evaluate |
Skill Validation Results
[1] (Isolated) Quality improved but weighted score is -16.9% due to: judgment, tokens (63523 → 90348), quality, time (41.9s → 50.7s) Model: claude-opus-4.6 | Judge: claude-opus-4.6 🔍 Full Results - additional metrics and failure investigation steps
|
Why
The skill-validator''s per-plugin aggregate description cap (
SkillProfiler.MaxAggregateDescriptionLength) had been raised 15,000 → 20,000 → 22,000, justified by a code comment asserting that 15K was "a local repo policy, NOT a documented Copilot/agentskills constraint."That assertion is wrong. The GitHub Copilot CLI renders the model-facing
<available_skills>menu under a hard 15,000-character budget (the agent SDK''sSKILL_CHAR_BUDGET, default15e3— confirmed in CLI 1.0.36 and 1.0.61). Skills are listed alphabetically by name and emitted with their full<description>only until the budget is exhausted; every skill past the cut-off collapses to a bare name with no description and can no longer be reliably model-activated.Raising the validator cap didn''t add headroom — it masked silent menu truncation. This is the root cause behind the
dotnet-testplugin-arm activation failures (e.g.run-tests,test-*): they sit alphabetically late, fell into the name-only overflow, and never activated in plugin eval runs even though they activate fine in isolation. Description tuning can''t fix that — the description is never shown.What
SkillProfiler.MaxAggregateDescriptionLength: 22,000 → 15,000, with the comment rewritten to document the real Copilot CLI budget (and correct the prior claim).disable-model-invocation: trueskills. The CLI drops those from the menu entirely, so they don''t consume the budget. This makes the cap satisfiable by hiding reference / agent-orchestrated primitives rather than only by trimming descriptions.InvestigatingResults.md: documents plugin-arm-only non-activation caused by skill-menu budget overflow, and how to fix it.dotnet-testcurrently aggregates ~20.7K chars (the only plugin over 15K), soskill-checkwill fail for it until it is slimmed below the cap — viadisable-model-invocationon reference/primitive skills (see #800) plus description trims. This PR should merge oncedotnet-testis ≤ 15K visible. All other plugins are already under the cap (next largest:dotnet-msbuildat ~14.5K).Verification
skill-validatorbuilds clean (0 warnings).disable-model-invocationskills are excluded from the aggregate (flagging two reference skills dropped the reported total by exactly their description lengths).