Skip to content

fix: prevent combine_scores from re-consuming aggregation keys on repeated calls#201

Merged
rhoadesScholar merged 3 commits intopq_evalfrom
copilot/sub-pr-200
Mar 24, 2026
Merged

fix: prevent combine_scores from re-consuming aggregation keys on repeated calls#201
rhoadesScholar merged 3 commits intopq_evalfrom
copilot/sub-pr-200

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 24, 2026

combine_scores iterated over all top-level keys in scores without guarding against known aggregation keys (label_scores, overall_*, total_evals, etc.). Passing an already-combined dict back in would re-consume those entries as crop data, corrupting TP/FP/FN accumulators and skewing all PQ outputs.

Changes

  • aggregation.py – Defines _AGGREGATION_KEYS frozenset of all known non-crop top-level keys; adds if crop_name in _AGGREGATION_KEYS: continue guard at the top of the crop-iteration loop in combine_scores
  • tests/test_evaluate_metrics.py – Adds test_combine_scores_idempotent_on_combined_dict regression test verifying scores are stable across two sequential calls
first  = combine_scores(scores, ...)
second = combine_scores(first, ...)   # previously corrupted results; now identical
assert np.isclose(first["overall_score"], second["overall_score"])  # passes
assert first["label_scores"]["instance"]["tp"] == second["label_scores"]["instance"]["tp"]  # passes

⌨️ Start Copilot coding agent tasks without leaving your editor — available in VS Code, Visual Studio, JetBrains IDEs and Eclipse.

Copilot AI changed the title [WIP] [WIP] Address feedback on Pq eval pull request fix: prevent combine_scores from re-consuming aggregation keys on repeated calls Mar 24, 2026
Copilot AI requested a review from rhoadesScholar March 24, 2026 18:42
@rhoadesScholar rhoadesScholar marked this pull request as ready for review March 24, 2026 18:45
@rhoadesScholar rhoadesScholar merged commit f010654 into pq_eval Mar 24, 2026
7 checks passed
@rhoadesScholar rhoadesScholar deleted the copilot/sub-pr-200 branch March 24, 2026 19:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants