Add eval coverage for dotnet-test/writing-mstest-tests by Evangelink · Pull Request #828 · dotnet/skills

Evangelink · 2026-06-25T08:15:24Z

Summary

Adds eval-test coverage for six previously-uncovered modern MSTest assertion APIs taught in plugins/dotnet-test/skills/writing-mstest-tests/SKILL.md.

The existing "Write tests with collection, null, and reference assertions" scenario (Goal 7) already ships the ideal ServiceRegistry fixture — a class whose members naturally elicit each assertion — but lacked deterministic assertions tying the generated test code to these tokens. This PR enriches that scenario's prompt, adds deterministic file_contains assertions, and sharpens the rubric.

Now-covered CodePattern points

Assert.IsNull (SKILL.md ~L154) — resolving an unregistered service returns null
Assert.AreSame (~L154) — resolving a registered service returns the same instance
Assert.Contains (~L178) — GetAll() includes a registered service
Assert.DoesNotContain (~L178) — service absent after Remove
Assert.IsEmpty (~L178) — GetAll() empty before registration
Assert.IsNotEmpty (~L178) — GetAll() not empty after registration

Verification

eng/skill-coverage/Measure-SkillCoverage.ps1 -PluginName dotnet-test -SkillName writing-mstest-tests -Format Json → "uncovered": [] (all six targets covered, no regressions).
dotnet run --project eng/skill-validator/src/SkillValidator.csproj -- check --plugin ./plugins/dotnet-test → ✅ All checks passed (only pre-existing token-budget warnings).

Only eval.yaml changed; no SKILL.md or other skills touched.

Co-authored-by: Copilot 223556219+Copilot@users.noreply.github.com

Enrich the ServiceRegistry collection/null/reference scenario with deterministic file_contains assertions for the modern MSTest assertion APIs Assert.IsNull, Assert.AreSame, Assert.Contains, Assert.DoesNotContain, Assert.IsEmpty, and Assert.IsNotEmpty. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions · 2026-06-25T08:16:04Z

Skill Coverage Report

	Plugin	Skill	Covered	Coverage
✅	`dotnet-test`	`writing-mstest-tests`	45/45	100%

Copilot

Pull request overview

This PR updates the dotnet-test plugin’s evaluation scenario for writing-mstest-tests to deterministically cover several modern MSTest assertion APIs by refining the scenario prompt, adding file_contains checks for specific assertions, and tightening the rubric accordingly.

Changes:

Refines the Goal 7 prompt to explicitly request behaviors that map to the targeted MSTest assertions.
Adds file_contains assertions to require the generated test file to include specific Assert.* APIs.
Updates the rubric to explicitly call out the required assertion APIs and expected usage.

Show a summary per file

File	Description
tests/dotnet-test/writing-mstest-tests/eval.yaml	Tightens the Goal 7 scenario prompt/rubric and adds deterministic file assertions to enforce coverage of modern MSTest assertion APIs.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Files reviewed: 1/1 changed files
Comments generated: 1

…nario Address PR review: file_contains is a case-sensitive substring check, so the Assert.Contains and Assert.DoesNotContain value checks would also pass for CollectionAssert.* and StringAssert.* helpers. Add file_not_contains guards to keep the scenario a deterministic check for the modern Assert.* API. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Evangelink · 2026-06-25T08:33:04Z

/evaluate

github-actions · 2026-06-25T08:41:34Z

❌ Evaluation failed. View workflow run

github-actions · 2026-06-25T09:29:11Z

👋 @Evangelink — this PR has 1 unresolved review thread(s). When you're ready, please address the feedback and push an update; the triage bot will pick up the next state automatically. (Add the no-stale label to silence further pings.)

github-actions · 2026-06-25T11:27:29Z

Skill Validation Results

Skill	Scenario	Quality	Skills Loaded	Overfit	Verdict
writing-mstest-tests	Write unit tests for a service class	2.7/5 → 3.0/5 🟢	✅ writing-mstest-tests; tools: skill	🟡 0.41	❌ [1]
writing-mstest-tests	Write data-driven tests for a calculator	1.7/5 → 1.3/5 🔴	✅ writing-mstest-tests; tools: skill, glob / ✅ writing-mstest-tests; tools: skill, edit	🟡 0.41	❌ [2]
writing-mstest-tests	Write async tests with cancellation	3.0/5 → 5.0/5 🟢	✅ writing-mstest-tests; tools: skill, report_intent / ⚠️ NOT ACTIVATED	🟡 0.41	✅ [3]
writing-mstest-tests	Fix swapped Assert.AreEqual arguments	5.0/5 → 5.0/5	⚠️ NOT ACTIVATED	🟡 0.41	❌ [4]
writing-mstest-tests	Modernize legacy test patterns	3.7/5 → 2.0/5 🔴	✅ writing-mstest-tests; tools: skill / ⚠️ NOT ACTIVATED	🟡 0.41	❌ [5]
writing-mstest-tests	Replace ExpectedException with Assert.Throws	2.3/5 → 5.0/5 🟢	✅ writing-mstest-tests; tools: skill / ⚠️ NOT ACTIVATED	🟡 0.41	❌ [6]
writing-mstest-tests	Use proper collection assertions	2.3/5 → 1.3/5 🔴	✅ writing-mstest-tests; tools: skill / ⚠️ NOT ACTIVATED	🟡 0.41	❌ [7]
writing-mstest-tests	Use proper type assertions instead of casts	1.0/5 → 1.7/5 🟢	⚠️ NOT ACTIVATED	🟡 0.41	✅ [8]
writing-mstest-tests	Set up test lifecycle correctly	2.3/5 → 4.7/5 🟢	✅ writing-mstest-tests; tools: skill / ⚠️ NOT ACTIVATED	🟡 0.41	❌ [9]
writing-mstest-tests	Use DynamicData with ValueTuples over object arrays	3.0/5 → 3.0/5	✅ writing-mstest-tests; tools: skill / ⚠️ NOT ACTIVATED	🟡 0.41	❌ [10]
writing-mstest-tests	Use string assertions for format validation	4.0/5 → 3.7/5 ⏰ 🔴	✅ writing-mstest-tests; tools: skill, create / ⚠️ NOT ACTIVATED	🟡 0.41	❌ [11]
writing-mstest-tests	Use comparison assertions for boundary testing	2.0/5 → 4.7/5 🟢	✅ writing-mstest-tests; tools: skill / ⚠️ NOT ACTIVATED	🟡 0.41	❌ [12]
writing-mstest-tests	Write tests with collection, null, and reference assertions	3.0/5 → 4.0/5 🟢	✅ writing-mstest-tests; tools: glob, skill / ⚠️ NOT ACTIVATED	🟡 0.41	❌ [13]
writing-mstest-tests	Configure conditional execution, retry, and cleanup	2.3/5 → 5.0/5 🟢	✅ writing-mstest-tests; tools: skill / ⚠️ NOT ACTIVATED	🟡 0.41	❌ [14]
writing-mstest-tests	Configure test parallelization and MSTest.Sdk project	3.0/5 → 5.0/5 🟢	✅ writing-mstest-tests; tools: skill	🟡 0.41	✅ [15]

[1] ⚠️ High run-to-run variance (CV=376%) — consider re-running with --runs 5. (Isolated) Quality improved but weighted score is -44.4% due to: judgment, quality
[2] ⚠️ High run-to-run variance (CV=670%) — consider re-running with --runs 5
[3] ⚠️ High run-to-run variance (CV=859%) — consider re-running with --runs 5
[4] ⚠️ High run-to-run variance (CV=126%) — consider re-running with --runs 5. (Isolated) Quality unchanged but weighted score is -17.1% due to: judgment, quality
[5] ⚠️ High run-to-run variance (CV=275%) — consider re-running with --runs 5
[6] ⚠️ High run-to-run variance (CV=110%) — consider re-running with --runs 5. (Plugin) Quality improved but weighted score is -2.4% due to: tokens (12605 → 17233), time (5.4s → 6.7s)
[7] ⚠️ High run-to-run variance (CV=101%) — consider re-running with --runs 5
[8] ⚠️ High run-to-run variance (CV=90%) — consider re-running with --runs 5
[9] ⚠️ High run-to-run variance (CV=145%) — consider re-running with --runs 5. (Plugin) Quality improved but weighted score is -10.6% due to: judgment, quality
[10] ⚠️ High run-to-run variance (CV=82%) — consider re-running with --runs 5
[11] ⚠️ High run-to-run variance (CV=193%) — consider re-running with --runs 5
[12] (Plugin) Quality improved but weighted score is -22.3% due to: judgment, quality, tokens (14074 → 18034)
[13] ⚠️ High run-to-run variance (CV=164%) — consider re-running with --runs 5. (Plugin) Quality unchanged but weighted score is -2.0% due to: tokens (218138 → 293429)
[14] ⚠️ High run-to-run variance (CV=719%) — consider re-running with --runs 5
[15] ⚠️ High run-to-run variance (CV=193%) — consider re-running with --runs 5

⏰ timeout — run(s) hit the (180s) scenario timeout limit; scoring may be impacted by aborting model execution before it could produce its full output (increase via timeout in eval.yaml)

Model: claude-opus-4.6 | Judge: claude-opus-4.6

🔍 Full Results - additional metrics and failure investigation steps

To investigate failures, paste this to your AI coding agent:

For PR 828 in dotnet/skills, download eval artifacts with gh run download 28166183426 --repo dotnet/skills --pattern "skill-validator-results-*" --dir ./eval-results, then fetch https://raw.githubusercontent.com/dotnet/skills/81c1b2094c13d81619588b3ca975876826755f1e/eng/skill-validator/src/docs/InvestigatingResults.md and follow it to analyze the results.json files. Diagnose each failure, suggest fixes to the eval.yaml and skill content, and tell me what to fix first.

▶ Sessions Visualisation -- interactive replay of all evaluation sessions
📊 Session Analytics (preview) -- aggregated metrics across evaluation sessions

Copilot AI review requested due to automatic review settings June 25, 2026 08:15

Copilot started reviewing on behalf of Evangelink June 25, 2026 08:15 View session

Copilot AI reviewed Jun 25, 2026

View reviewed changes

Comment thread tests/dotnet-test/writing-mstest-tests/eval.yaml

github-actions Bot added the waiting-on-author PR state label label Jun 25, 2026

github-actions Bot added pr-state/ready-for-eval PR is mergeable and awaiting evaluation and removed waiting-on-author PR state label labels Jun 25, 2026

Evangelink enabled auto-merge (squash) June 25, 2026 11:24

YuliiaKovalova approved these changes Jun 25, 2026

View reviewed changes

Evangelink merged commit 26f01b5 into main Jun 25, 2026
34 of 36 checks passed

Evangelink deleted the evangelink-eval-coverage-writing-mstest-tests branch June 25, 2026 12:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add eval coverage for dotnet-test/writing-mstest-tests#828

Add eval coverage for dotnet-test/writing-mstest-tests#828
Evangelink merged 2 commits into
mainfrom
evangelink-eval-coverage-writing-mstest-tests

Evangelink commented Jun 25, 2026

Uh oh!

github-actions Bot commented Jun 25, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Evangelink commented Jun 25, 2026

Uh oh!

github-actions Bot commented Jun 25, 2026

Uh oh!

github-actions Bot commented Jun 25, 2026

Uh oh!

github-actions Bot commented Jun 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

Evangelink commented Jun 25, 2026

Summary

Now-covered CodePattern points

Verification

Uh oh!

github-actions Bot commented Jun 25, 2026

Skill Coverage Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

Uh oh!

Evangelink commented Jun 25, 2026

Uh oh!

github-actions Bot commented Jun 25, 2026

Uh oh!

github-actions Bot commented Jun 25, 2026

Uh oh!

github-actions Bot commented Jun 25, 2026

Skill Validation Results

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants