Add eval coverage for dotnet-test/dotnet-test-frameworks by Evangelink · Pull Request #820 · dotnet/skills

Evangelink · 2026-06-25T08:11:05Z

Summary

Adds eval-test coverage for the previously-uncovered MSTest assertion CodePattern Assert.AreEqual in the dotnet-test/dotnet-test-frameworks skill.

Change

In the "Replace try-catch with framework-native exception assertions" scenario, the MSTest refactor genuinely preserves the message check Assert.AreEqual("...", ex.Message);. Added a deterministic output_matches assertion with pattern Assert\.AreEqual where that output is naturally expected.

Verification

Measure-SkillCoverage.ps1 -PluginName dotnet-test -SkillName dotnet-test-frameworks: coverage is now 5/5 (100%), uncovered: [] — Assert.AreEqual is covered, no other point regressed.
SkillValidator check --plugin ./plugins/dotnet-test: ✅ all checks passed (only pre-existing, unrelated warnings).

Only tests/dotnet-test/dotnet-test-frameworks/eval.yaml was modified.

Add a deterministic output_matches assertion for Assert.AreEqual in the MSTest try/catch refactor scenario, covering the previously-uncovered MSTest assertion CodePattern. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions · 2026-06-25T08:11:41Z

Skill Coverage Report

	Plugin	Skill	Covered	Coverage
✅	`dotnet-test`	`dotnet-test-frameworks`	5/5	100%

Copilot

Pull request overview

This PR updates the dotnet-test plugin’s evaluation suite to increase coverage for MSTest’s Assert.AreEqual usage within the dotnet-test-frameworks skill, specifically in the scenario that refactors try/catch exception assertions to framework-native patterns.

Changes:

Added an output_matches assertion intended to detect MSTest Assert.AreEqual output in the “Replace try-catch with framework-native exception assertions” scenario.

Show a summary per file

File	Description
tests/dotnet-test/dotnet-test-frameworks/eval.yaml	Adds an additional `output_matches` assertion to increase eval coverage for MSTest `Assert.AreEqual`.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Files reviewed: 1/1 changed files
Comments generated: 1

Address review feedback: the output_matches pattern now requires the exception message comparison (Assert.AreEqual(..., ex.Message)) rather than matching Assert.AreEqual anywhere. The CodePattern remains covered via the rubric evidence. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Evangelink · 2026-06-25T08:32:51Z

/evaluate

github-actions · 2026-06-25T08:38:17Z

❌ Evaluation failed. View workflow run

github-actions · 2026-06-25T09:29:34Z

👋 @Evangelink — this PR has 1 unresolved review thread(s). When you're ready, please address the feedback and push an update; the triage bot will pick up the next state automatically. (Add the no-stale label to silence further pings.)

Evangelink · 2026-06-25T10:45:25Z

/evaluate

github-actions · 2026-06-25T10:53:43Z

Skill Validation Results

Skill	Scenario	Quality	Skills Loaded	Overfit	Verdict
dotnet-test-frameworks	Cross-framework assertion equivalence mapping	5.0/5 → 5.0/5	ℹ️ not activated (expected)	✅ 0.07	❌ [1]
dotnet-test-frameworks	Identify TUnit framework and its unique attributes	5.0/5 → 5.0/5	ℹ️ not activated (expected)	✅ 0.07	❌ [2]
dotnet-test-frameworks	Replace try-catch with framework-native exception assertions	5.0/5 → 5.0/5	ℹ️ not activated (expected)	✅ 0.07	❌ [3]
dotnet-test-frameworks	Skip annotations across all four frameworks	5.0/5 → 5.0/5	ℹ️ not activated (expected)	✅ 0.07	❌ [4]
dotnet-test-frameworks	Convert NUnit lifecycle methods to xUnit equivalents	5.0/5 → 5.0/5	ℹ️ not activated (expected)	✅ 0.07	❌ [5]
dotnet-test-frameworks	Identify integration tests by markers and code patterns	5.0/5 → 5.0/5	ℹ️ not activated (expected)	✅ 0.07	❌ [6]
dotnet-test-frameworks	Convert cross-framework assertions to TUnit syntax	1.0/5 → 1.0/5	ℹ️ not activated (expected)	✅ 0.07	✅ [7]
dotnet-test-frameworks	Diagnose silently-passing TUnit test with missing await	4.3/5 → 4.3/5	ℹ️ not activated (expected)	✅ 0.07	✅ [8]
dotnet-test-frameworks	Refactor TUnit try/catch to native exception assertion	4.0/5 → 3.3/5 🔴	ℹ️ not activated (expected)	✅ 0.07	❌
dotnet-test-frameworks	TUnit lifecycle hooks at test, class, assembly, and session scope	4.3/5 → 4.0/5 🔴	ℹ️ not activated (expected)	✅ 0.07	❌ [9]
dotnet-test-frameworks	TUnit skip mechanisms — attribute, assembly-wide, and dynamic	3.3/5 → 3.3/5	ℹ️ not activated (expected)	✅ 0.07	❌ [10]

[1] (Plugin) Quality unchanged but weighted score is -2.4% due to: tokens (12833 → 17381), time (9.6s → 12.0s)
[2] ⚠️ High run-to-run variance (CV=91%) — consider re-running with --runs 5. (Isolated) Quality unchanged but weighted score is -14.8% due to: judgment, quality
[3] (Plugin) Quality unchanged but weighted score is -2.0% due to: tokens (13094 → 17675)
[4] ⚠️ High run-to-run variance (CV=86%) — consider re-running with --runs 5. (Isolated) Quality unchanged but weighted score is -28.4% due to: quality, judgment
[5] ⚠️ High run-to-run variance (CV=84%) — consider re-running with --runs 5. (Plugin) Quality unchanged but weighted score is -3.4% due to: tokens (13117 → 17752), quality
[6] (Plugin) Quality unchanged but weighted score is -1.7% due to: tokens (13337 → 17920)
[7] ⚠️ High run-to-run variance (CV=295%) — consider re-running with --runs 5
[8] ⚠️ High run-to-run variance (CV=89%) — consider re-running with --runs 5
[9] ⚠️ High run-to-run variance (CV=58%) — consider re-running with --runs 5
[10] ⚠️ High run-to-run variance (CV=259%) — consider re-running with --runs 5

Model: claude-opus-4.6 | Judge: claude-opus-4.6

🔍 Full Results - additional metrics and failure investigation steps

To investigate failures, paste this to your AI coding agent:

For PR 820 in dotnet/skills, download eval artifacts with gh run download 28164660258 --repo dotnet/skills --pattern "skill-validator-results-*" --dir ./eval-results, then fetch https://raw.githubusercontent.com/dotnet/skills/00cc4680d233662a069ef55fe391c56d17c0f0f2/eng/skill-validator/src/docs/InvestigatingResults.md and follow it to analyze the results.json files. Diagnose each failure, suggest fixes to the eval.yaml and skill content, and tell me what to fix first.

▶ Sessions Visualisation -- interactive replay of all evaluation sessions
📊 Session Analytics (preview) -- aggregated metrics across evaluation sessions

Copilot AI review requested due to automatic review settings June 25, 2026 08:11

Copilot started reviewing on behalf of Evangelink June 25, 2026 08:11 View session

Copilot AI reviewed Jun 25, 2026

View reviewed changes

Comment thread tests/dotnet-test/dotnet-test-frameworks/eval.yaml Outdated

github-actions Bot added the waiting-on-author PR state label label Jun 25, 2026

Evangelink enabled auto-merge (squash) June 25, 2026 10:51

YuliiaKovalova approved these changes Jun 25, 2026

View reviewed changes

Evangelink merged commit d16bad4 into main Jun 25, 2026
34 of 36 checks passed

Evangelink deleted the evangelink-eval-coverage-dotnet-test-frameworks branch June 25, 2026 10:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add eval coverage for dotnet-test/dotnet-test-frameworks#820

Add eval coverage for dotnet-test/dotnet-test-frameworks#820
Evangelink merged 2 commits into
mainfrom
evangelink-eval-coverage-dotnet-test-frameworks

Evangelink commented Jun 25, 2026

Uh oh!

github-actions Bot commented Jun 25, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Evangelink commented Jun 25, 2026

Uh oh!

github-actions Bot commented Jun 25, 2026

Uh oh!

github-actions Bot commented Jun 25, 2026

Uh oh!

Evangelink commented Jun 25, 2026

Uh oh!

github-actions Bot commented Jun 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

Evangelink commented Jun 25, 2026

Summary

Change

Verification

Uh oh!

github-actions Bot commented Jun 25, 2026

Skill Coverage Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

Uh oh!

Evangelink commented Jun 25, 2026

Uh oh!

github-actions Bot commented Jun 25, 2026

Uh oh!

github-actions Bot commented Jun 25, 2026

Uh oh!

Evangelink commented Jun 25, 2026

Uh oh!

github-actions Bot commented Jun 25, 2026

Skill Validation Results

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants