Add eval coverage for dotnet-test/grade-tests#825
Conversation
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Skill Coverage Report
|
There was a problem hiding this comment.
Pull request overview
This PR expands the dotnet-test/grade-tests evaluation suite by adding two new scenarios (Go table-driven tests; production code unavailable) plus new fixtures to exercise previously uncovered guidance in the grade-tests skill.
Changes:
- Added Scenario 4 to grade idiomatic Go table-driven tests and ensure loops/assertion patterns aren’t misclassified as branching.
- Added Scenario 5 to grade tests when the production code is missing, expecting “Unverified” (not deductions) and compact PR-comment output.
- Added supporting Go and C# fixture workspaces for the new scenarios.
Show a summary per file
| File | Description |
|---|---|
| tests/dotnet-test/grade-tests/eval.yaml | Adds two new eval scenarios and their assertions/rubrics to cover additional grade-tests teaching points. |
| tests/dotnet-test/grade-tests/fixtures/go-table-driven/go.mod | Introduces a minimal Go module for the table-driven test fixture. |
| tests/dotnet-test/grade-tests/fixtures/go-table-driven/calculator.go | Adds simple Go production code to be graded against by the Go scenario. |
| tests/dotnet-test/grade-tests/fixtures/go-table-driven/calculator_test.go | Adds Go tests of varying quality (A/C/F) including a table-driven subtest loop. |
| tests/dotnet-test/grade-tests/fixtures/production-unavailable/Payments.Tests/Payments.Tests.csproj | Adds a .NET test project fixture explicitly lacking the production project reference. |
| tests/dotnet-test/grade-tests/fixtures/production-unavailable/Payments.Tests/PaymentGatewayTests.cs | Adds MSTest methods of varying quality for the “production unavailable” grading scenario. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 6/6 changed files
- Comments generated: 2
Address PR review: verify TestParse_NoError=C in scenario 4, and Charge_NegativeAmount=A / Refund_ExistingCharge=C in scenario 5 so the scenarios cannot pass on misgraded output. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
/evaluate |
Skill Validation Results
[1]
Model: claude-opus-4.6 | Judge: claude-opus-4.6 🔍 Full Results - additional metrics and failure investigation steps
|
|
👋 @Evangelink — this PR has 1 unresolved review thread(s). When you're ready, please address the feedback and push an update; the triage bot will pick up the next state automatically. (Add the |
|
✅ Evaluation passed for |
Extends
tests/dotnet-test/grade-tests/eval.yamlwith two new scenarios (plus supporting fixtures) so the previously-uncovered teaching points inplugins/dotnet-test/skills/grade-tests/SKILL.mdare now exercised.Now-covered points
What was added
fixtures/go-table-driven/): grades idiomaticfor ... range/t.Runsubtests; rubric asserts the loop is NOT misread as conditional logic, every grade rests on an observable signal, and deductions are not inflated.fixtures/production-unavailable/): grades tests whose code under test (Payments.Core) is absent; rubric asserts behavioral concerns are markedUnverified(not deducted) and the report stays compact rather than spilling a giant 500-row table into the PR comment.Prompts are natural (no skill references); rubric items are outcome-focused and independently evaluable.
Verification
Measure-SkillCoverage.ps1 -PluginName dotnet-test -SkillName grade-tests→ 100% (28/28),uncoveredempty, no regressions.SkillValidator check --plugin ./plugins/dotnet-test→ ✅ all checks passed (YAML parses; remaining warnings are pre-existing token-size notes).No SKILL.md or other skills were modified.