Add eval coverage for dotnet-test/test-gap-analysis by Evangelink · Pull Request #829 · dotnet/skills

Evangelink · 2026-06-25T08:15:39Z

Extends tests/dotnet-test/test-gap-analysis/eval.yaml to cover eight previously-uncovered SKILL.md teaching points, adding two scenarios with supporting fixtures.

Now-covered points

Validation

Trivial code (simple getters, auto-properties) is excluded from analysis
Findings are prioritized by risk, not just listed in source order
Report includes strengths (killed mutations) alongside gaps
Mutation categories are correctly labeled

Common Pitfalls

Analyzing trivial code
Ignoring call chains
Over-counting mutations in generated code
Forgetting Rust's ? operator

What changed

Scenario 5 (report-quality) — new C# Billing fixture with trivial auto-properties/getters, an auto-generated InvoiceProcessor.g.cs, and private helpers reached via a call chain. Rubric items verify the analysis excludes trivial code, skips generated code, traces call chains, prioritizes by risk, reports strengths alongside gaps, and labels mutation categories.
Scenario 6 (rust-error-propagation) — new Rust fixture using the ? operator with no test on the error path. Rubric items verify the ? propagation is flagged as an Exception/Panic mutation point.

Verification

Measure-SkillCoverage.ps1: 100% (24/24), uncovered empty, no regressions (was 66.7%).
SkillValidator check --plugin ./plugins/dotnet-test: ✅ all checks passed (only pre-existing token-size warnings).

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions · 2026-06-25T08:16:13Z

Skill Coverage Report

	Plugin	Skill	Covered	Coverage
✅	`dotnet-test`	`test-gap-analysis`	24/24	100%

Copilot

Pull request overview

This PR extends the dotnet-test/test-gap-analysis evaluation suite by adding two new scenarios (C# “report quality” and Rust ? error propagation) plus supporting fixtures, to cover previously-uncovered SKILL.md teaching points around prioritization, trivial/generated code handling, call-chain tracing, strengths reporting, and Rust error propagation.

Changes:

Added Scenario 5 to eval.yaml with a new C# Billing fixture (includes trivial members, a generated .g.cs file, and helper call chains) and rubric/assertions targeting report quality.
Added Scenario 6 to eval.yaml with a new Rust fixture demonstrating unobserved ? error propagation and rubric/assertions to classify it as an Exception/Panic mutation point.
Introduced new fixture projects/files under fixtures/report-quality and fixtures/rust-error-propagation.

Show a summary per file

File	Description
tests/dotnet-test/test-gap-analysis/fixtures/rust-error-propagation/src/lib.rs	Rust library + tests to demonstrate unobserved `?`-based error propagation.
tests/dotnet-test/test-gap-analysis/fixtures/rust-error-propagation/Cargo.toml	Minimal Cargo manifest for the Rust fixture.
tests/dotnet-test/test-gap-analysis/fixtures/report-quality/Billing/InvoiceProcessor.g.cs	Auto-generated partial type stub to validate generated-code exclusion.
tests/dotnet-test/test-gap-analysis/fixtures/report-quality/Billing/InvoiceProcessor.cs	Billing business-logic fixture used to drive risk-prioritized gap analysis.
tests/dotnet-test/test-gap-analysis/fixtures/report-quality/Billing/Billing.csproj	C# project file for the Billing fixture.
tests/dotnet-test/test-gap-analysis/fixtures/report-quality/Billing.Tests/InvoiceProcessorTests.cs	MSTest fixture tests intentionally leaving gaps to be reported.
tests/dotnet-test/test-gap-analysis/fixtures/report-quality/Billing.Tests/Billing.Tests.csproj	Test project file referencing Billing + MSTest.
tests/dotnet-test/test-gap-analysis/eval.yaml	Adds two new evaluation scenarios and associated rubrics/assertions.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Files reviewed: 8/8 changed files
Comments generated: 1

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Evangelink · 2026-06-25T08:33:03Z

/evaluate

github-actions · 2026-06-25T08:48:25Z

Skill Validation Results

Skill	Scenario	Quality	Skills Loaded	Overfit	Verdict
test-gap-analysis	Find boundary mutation gaps in tiered discount and shipping logic	5.0/5 → 5.0/5	✅ test-gap-analysis; tools: skill	✅ 0.15	✅
test-gap-analysis	Find logic and null-check mutation gaps in access control code	4.7/5 → 5.0/5 🟢	✅ test-gap-analysis; tools: skill, bash / ✅ test-gap-analysis; tools: skill	✅ 0.15	✅ [1]
test-gap-analysis	Acknowledge well-tested code with few surviving mutations	3.7/5 → 3.3/5 🔴	✅ test-gap-analysis; tools: skill	✅ 0.15	❌ [2]
test-gap-analysis	Decline request to write new tests from scratch	1.7/5 ⏰ → 2.3/5 ⏰ 🟢	ℹ️ not activated (expected)	✅ 0.15	✅ [3]
test-gap-analysis	Produce a risk-prioritized report that excludes trivial and generated code	3.3/5 → 4.7/5 🟢	✅ test-gap-analysis; tools: skill	✅ 0.15	✅
test-gap-analysis	Flag the Rust ? operator propagation as an unobserved mutation point	4.3/5 → 5.0/5 🟢	✅ test-gap-analysis; tools: skill	✅ 0.15	✅ [4]

[1] ⚠️ High run-to-run variance (CV=230%) — consider re-running with --runs 5
[2] ⚠️ High run-to-run variance (CV=276%) — consider re-running with --runs 5
[3] ⚠️ High run-to-run variance (CV=705%) — consider re-running with --runs 5
[4] ⚠️ High run-to-run variance (CV=77%) — consider re-running with --runs 5

⏰ timeout — run(s) hit the (120s) scenario timeout limit; scoring may be impacted by aborting model execution before it could produce its full output (increase via timeout in eval.yaml)

Model: claude-opus-4.6 | Judge: claude-opus-4.6

🔍 Full Results - additional metrics and failure investigation steps

To investigate failures, paste this to your AI coding agent:

For PR 829 in dotnet/skills, download eval artifacts with gh run download 28157475694 --repo dotnet/skills --pattern "skill-validator-results-*" --dir ./eval-results, then fetch https://raw.githubusercontent.com/dotnet/skills/4f1b8ebcda7952b5b0be18221902a1813361c5a4/eng/skill-validator/src/docs/InvestigatingResults.md and follow it to analyze the results.json files. Diagnose each failure, suggest fixes to the eval.yaml and skill content, and tell me what to fix first.

▶ Sessions Visualisation -- interactive replay of all evaluation sessions
📊 Session Analytics (preview) -- aggregated metrics across evaluation sessions

github-actions · 2026-06-25T09:29:08Z

✅ Evaluation passed for 4f1b8eb. cc @dotnet/dotnet-testing — please review.

Add eval coverage for dotnet-test/test-gap-analysis

01896d1

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings June 25, 2026 08:15

Copilot started reviewing on behalf of Evangelink June 25, 2026 08:16 View session

Copilot AI reviewed Jun 25, 2026

View reviewed changes

Comment thread tests/dotnet-test/test-gap-analysis/fixtures/report-quality/Billing/InvoiceProcessor.cs Outdated

Mark InvoiceProcessor as partial to match generated part

4f1b8eb

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions Bot added the waiting-on-review PR state label label Jun 25, 2026

Evangelink enabled auto-merge (squash) June 25, 2026 10:38

YuliiaKovalova approved these changes Jun 25, 2026

View reviewed changes

Evangelink merged commit 91d68cb into main Jun 25, 2026
34 of 36 checks passed

Evangelink deleted the evangelink-eval-test-gap-analysis branch June 25, 2026 11:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add eval coverage for dotnet-test/test-gap-analysis#829

Add eval coverage for dotnet-test/test-gap-analysis#829
Evangelink merged 2 commits into
mainfrom
evangelink-eval-test-gap-analysis

Evangelink commented Jun 25, 2026

Uh oh!

github-actions Bot commented Jun 25, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Evangelink commented Jun 25, 2026

Uh oh!

github-actions Bot commented Jun 25, 2026

Uh oh!

github-actions Bot commented Jun 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

Evangelink commented Jun 25, 2026

Now-covered points

What changed

Verification

Uh oh!

github-actions Bot commented Jun 25, 2026

Skill Coverage Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

Uh oh!

Evangelink commented Jun 25, 2026

Uh oh!

github-actions Bot commented Jun 25, 2026

Skill Validation Results

Uh oh!

github-actions Bot commented Jun 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants