Skip to content

Add eval coverage for dotnet-test/dotnet-test-frameworks#820

Merged
Evangelink merged 2 commits into
mainfrom
evangelink-eval-coverage-dotnet-test-frameworks
Jun 25, 2026
Merged

Add eval coverage for dotnet-test/dotnet-test-frameworks#820
Evangelink merged 2 commits into
mainfrom
evangelink-eval-coverage-dotnet-test-frameworks

Conversation

@Evangelink

Copy link
Copy Markdown
Member

Summary

Adds eval-test coverage for the previously-uncovered MSTest assertion CodePattern Assert.AreEqual in the dotnet-test/dotnet-test-frameworks skill.

Change

In the "Replace try-catch with framework-native exception assertions" scenario, the MSTest refactor genuinely preserves the message check Assert.AreEqual("...", ex.Message);. Added a deterministic output_matches assertion with pattern Assert\.AreEqual where that output is naturally expected.

Verification

  • Measure-SkillCoverage.ps1 -PluginName dotnet-test -SkillName dotnet-test-frameworks: coverage is now 5/5 (100%), uncovered: []Assert.AreEqual is covered, no other point regressed.
  • SkillValidator check --plugin ./plugins/dotnet-test: ✅ all checks passed (only pre-existing, unrelated warnings).

Only tests/dotnet-test/dotnet-test-frameworks/eval.yaml was modified.

Add a deterministic output_matches assertion for Assert.AreEqual in the MSTest try/catch refactor scenario, covering the previously-uncovered MSTest assertion CodePattern.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 25, 2026 08:11
@github-actions

Copy link
Copy Markdown
Contributor

Skill Coverage Report

Plugin Skill Covered Coverage
dotnet-test dotnet-test-frameworks 5/5 100%

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the dotnet-test plugin’s evaluation suite to increase coverage for MSTest’s Assert.AreEqual usage within the dotnet-test-frameworks skill, specifically in the scenario that refactors try/catch exception assertions to framework-native patterns.

Changes:

  • Added an output_matches assertion intended to detect MSTest Assert.AreEqual output in the “Replace try-catch with framework-native exception assertions” scenario.
Show a summary per file
File Description
tests/dotnet-test/dotnet-test-frameworks/eval.yaml Adds an additional output_matches assertion to increase eval coverage for MSTest Assert.AreEqual.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 1/1 changed files
  • Comments generated: 1

Comment thread tests/dotnet-test/dotnet-test-frameworks/eval.yaml Outdated
Address review feedback: the output_matches pattern now requires the exception message comparison (Assert.AreEqual(..., ex.Message)) rather than matching Assert.AreEqual anywhere. The CodePattern remains covered via the rubric evidence.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@Evangelink

Copy link
Copy Markdown
Member Author

/evaluate

@github-actions

Copy link
Copy Markdown
Contributor

❌ Evaluation failed. View workflow run

@github-actions github-actions Bot added the waiting-on-author PR state label label Jun 25, 2026
@github-actions

Copy link
Copy Markdown
Contributor

👋 @Evangelink — this PR has 1 unresolved review thread(s). When you're ready, please address the feedback and push an update; the triage bot will pick up the next state automatically. (Add the no-stale label to silence further pings.)

@Evangelink

Copy link
Copy Markdown
Member Author

/evaluate

@Evangelink Evangelink enabled auto-merge (squash) June 25, 2026 10:51
@github-actions

Copy link
Copy Markdown
Contributor

Skill Validation Results

Skill Scenario Quality Skills Loaded Overfit Verdict
dotnet-test-frameworks Cross-framework assertion equivalence mapping 5.0/5 → 5.0/5 ℹ️ not activated (expected) ✅ 0.07 [1]
dotnet-test-frameworks Identify TUnit framework and its unique attributes 5.0/5 → 5.0/5 ℹ️ not activated (expected) ✅ 0.07 [2]
dotnet-test-frameworks Replace try-catch with framework-native exception assertions 5.0/5 → 5.0/5 ℹ️ not activated (expected) ✅ 0.07 [3]
dotnet-test-frameworks Skip annotations across all four frameworks 5.0/5 → 5.0/5 ℹ️ not activated (expected) ✅ 0.07 [4]
dotnet-test-frameworks Convert NUnit lifecycle methods to xUnit equivalents 5.0/5 → 5.0/5 ℹ️ not activated (expected) ✅ 0.07 [5]
dotnet-test-frameworks Identify integration tests by markers and code patterns 5.0/5 → 5.0/5 ℹ️ not activated (expected) ✅ 0.07 [6]
dotnet-test-frameworks Convert cross-framework assertions to TUnit syntax 1.0/5 → 1.0/5 ℹ️ not activated (expected) ✅ 0.07 [7]
dotnet-test-frameworks Diagnose silently-passing TUnit test with missing await 4.3/5 → 4.3/5 ℹ️ not activated (expected) ✅ 0.07 [8]
dotnet-test-frameworks Refactor TUnit try/catch to native exception assertion 4.0/5 → 3.3/5 🔴 ℹ️ not activated (expected) ✅ 0.07
dotnet-test-frameworks TUnit lifecycle hooks at test, class, assembly, and session scope 4.3/5 → 4.0/5 🔴 ℹ️ not activated (expected) ✅ 0.07 [9]
dotnet-test-frameworks TUnit skip mechanisms — attribute, assembly-wide, and dynamic 3.3/5 → 3.3/5 ℹ️ not activated (expected) ✅ 0.07 [10]

[1] (Plugin) Quality unchanged but weighted score is -2.4% due to: tokens (12833 → 17381), time (9.6s → 12.0s)
[2] ⚠️ High run-to-run variance (CV=91%) — consider re-running with --runs 5. (Isolated) Quality unchanged but weighted score is -14.8% due to: judgment, quality
[3] (Plugin) Quality unchanged but weighted score is -2.0% due to: tokens (13094 → 17675)
[4] ⚠️ High run-to-run variance (CV=86%) — consider re-running with --runs 5. (Isolated) Quality unchanged but weighted score is -28.4% due to: quality, judgment
[5] ⚠️ High run-to-run variance (CV=84%) — consider re-running with --runs 5. (Plugin) Quality unchanged but weighted score is -3.4% due to: tokens (13117 → 17752), quality
[6] (Plugin) Quality unchanged but weighted score is -1.7% due to: tokens (13337 → 17920)
[7] ⚠️ High run-to-run variance (CV=295%) — consider re-running with --runs 5
[8] ⚠️ High run-to-run variance (CV=89%) — consider re-running with --runs 5
[9] ⚠️ High run-to-run variance (CV=58%) — consider re-running with --runs 5
[10] ⚠️ High run-to-run variance (CV=259%) — consider re-running with --runs 5

Model: claude-opus-4.6 | Judge: claude-opus-4.6

🔍 Full Results - additional metrics and failure investigation steps

To investigate failures, paste this to your AI coding agent:

For PR 820 in dotnet/skills, download eval artifacts with gh run download 28164660258 --repo dotnet/skills --pattern "skill-validator-results-*" --dir ./eval-results, then fetch https://raw.githubusercontent.com/dotnet/skills/00cc4680d233662a069ef55fe391c56d17c0f0f2/eng/skill-validator/src/docs/InvestigatingResults.md and follow it to analyze the results.json files. Diagnose each failure, suggest fixes to the eval.yaml and skill content, and tell me what to fix first.

▶ Sessions Visualisation -- interactive replay of all evaluation sessions
📊 Session Analytics (preview) -- aggregated metrics across evaluation sessions

@Evangelink Evangelink merged commit d16bad4 into main Jun 25, 2026
34 of 36 checks passed
@Evangelink Evangelink deleted the evangelink-eval-coverage-dotnet-test-frameworks branch June 25, 2026 10:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

waiting-on-author PR state label

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants