feat: mAP scoring for bounding box evaluation (#89)#151
Open
Divya-Bhargavi wants to merge 4 commits into
Open
Conversation
added 4 commits
June 8, 2026 09:36
Computes Intersection over Union between two bounding boxes as a similarity score. Supports two-point ([[x1,y1],[x2,y2]]) and flat ([x1,y1,x2,y2]) formats with coordinate normalization. Registered in the comparator registry and exported from stickler.comparators.
Adds mean Average Precision scoring for bounding box localization, built on the rich-value (_bbox) pattern and the PostComparisonAccumulator interface. - MAPCalculator: extract_from_dicts() joins field_comparisons with gt/pred bounding boxes into keyed pairs; compute_metrics() produces per-field IoU/precision/recall/F1/AP plus overall mean AP and coverage. Mirrors ConfidenceCalculator's calculator/accumulator split. - BBoxMAPAccumulator: PostComparisonAccumulator implementation (name 'bbox_map_metrics') for bulk evaluation, with get/load/merge_state for checkpointing and distributed runs. - comparison_engine: pre-extracts ground_truth_bboxes and prediction_bboxes into the result (mirrors prediction_confidences) so the accumulator can join both sides without the model instances; adds a single-document add_bbox_metrics sanity-check path. - structured_model: forwards add_bbox_metrics and bbox_iou_threshold.
43 tests covering BBoxIoUComparator IoU math and edge cases, MAPCalculator extraction/metrics/coverage, BBoxMAPAccumulator accumulation and state round-trip/merge, bulk evaluator integration (including coexistence with ConfidenceAccumulator), and the single-document add_bbox_metrics path.
Adds the Advanced/bbox-map-metrics.md page (accumulator-first usage with the _bbox rich-value key, plus single-document sanity-check path), wires it into the Advanced nav, and documents BBoxIoUComparator in the Comparators guide.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Issue
Closes #89
Description of changes
Adds Mean Average Precision (mAP) scoring for bounding box evaluation within the stickler
framework, enabling users to measure how accurately a model locates information on a page.
This builds on the rich-value pattern and the
PostComparisonAccumulatoraggregateinfrastructure introduced in #98 (now merged into
dev/ released in v0.4.0), so it targetsdevdirectly.What's included
Supports
[[x1,y1],[x2,y2]]and[x1,y1,x2,y2]formats with coordinate normalization.Registered in the comparator registry and exported from
stickler.comparators.ConfidenceCalculator's calculator/accumulator split:extract_from_dicts()joinsfield_comparisonswith ground-truth and prediction boxesinto keyed pairs;
compute_metrics()produces per-field IoU/precision/recall/F1/AP plusoverall mean AP and coverage.
PostComparisonAccumulator(namebbox_map_metrics) for bulkevaluation, with
get_state/load_state/merge_statefor checkpointing and distributedruns. The base-class docstring already reserved this name for the planned implementation.
_bboxextraction — bounding boxes ride on the underscore rich-valuepattern (
_value/_confidence/_bbox) and land in field extras viaget_all_extras();no changes to the rich-value helper were needed.
ground_truth_bboxesandprediction_bboxesinto the result (mirroring
prediction_confidences) so the accumulator can join bothsides. A single-document
add_bbox_metrics=Truesanity-check path is also provided.bulk integration (including coexistence with
ConfidenceAccumulator), single-doc path.docs/docs/Advanced/bbox-map-metrics.mdand an updated comparatorsguide.
Usage (bulk — recommended)
Bounding boxes are provided via the rich-value pattern:
Testing
dev+ 43 new), no regressions.ruff checkpasses on all changed files.By submitting this pull request, I confirm that you can use, modify, copy, and redistribute
this contribution, under the terms of your choice.