feat: mAP scoring for bounding box evaluation (#89) by Divya-Bhargavi · Pull Request #151 · awslabs/stickler

Divya-Bhargavi · 2026-06-08T17:55:28Z

Issue

Closes #89

Description of changes

Adds Mean Average Precision (mAP) scoring for bounding box evaluation within the stickler
framework, enabling users to measure how accurately a model locates information on a page.

This builds on the rich-value pattern and the PostComparisonAccumulator aggregate
infrastructure introduced in #98 (now merged into dev / released in v0.4.0), so it targets
dev directly.

What's included

BBoxIoUComparator — computes Intersection over Union between two bounding boxes.
Supports [[x1,y1],[x2,y2]] and [x1,y1,x2,y2] formats with coordinate normalization.
Registered in the comparator registry and exported from stickler.comparators.
MAPCalculator — mirrors ConfidenceCalculator's calculator/accumulator split:
extract_from_dicts() joins field_comparisons with ground-truth and prediction boxes
into keyed pairs; compute_metrics() produces per-field IoU/precision/recall/F1/AP plus
overall mean AP and coverage.
BBoxMAPAccumulator — a PostComparisonAccumulator (name bbox_map_metrics) for bulk
evaluation, with get_state/load_state/merge_state for checkpointing and distributed
runs. The base-class docstring already reserved this name for the planned implementation.
Rich-value _bbox extraction — bounding boxes ride on the underscore rich-value
pattern (_value/_confidence/_bbox) and land in field extras via get_all_extras();
no changes to the rich-value helper were needed.
compare_with() wiring — pre-extracts ground_truth_bboxes and prediction_bboxes
into the result (mirroring prediction_confidences) so the accumulator can join both
sides. A single-document add_bbox_metrics=True sanity-check path is also provided.
43 tests — comparator math, calculator logic, accumulator + state round-trip/merge,
bulk integration (including coexistence with ConfidenceAccumulator), single-doc path.
Documentation — docs/docs/Advanced/bbox-map-metrics.md and an updated comparators
guide.

Usage (bulk — recommended)

from stickler.structured_object_evaluator.bulk_structured_model_evaluator import (
    BulkStructuredModelEvaluator,
)
from stickler.structured_object_evaluator.models.bbox_map_accumulator import (
    BBoxMAPAccumulator,
)

evaluator = BulkStructuredModelEvaluator(
    accumulators=[BBoxMAPAccumulator(iou_threshold=0.5)],
)
for gt, pred in dataset:
    evaluator.update(gt, pred)

metrics = evaluator.compute().accumulator_metrics["bbox_map_metrics"]
print(metrics["mean_ap"])

Bounding boxes are provided via the rich-value pattern:

gt = Invoice.from_json({
    "vendor": {"_value": "Acme", "_bbox": [[10, 20], [200, 50]]},
})
pred = Invoice.from_json({
    "vendor": {"_value": "Acme", "_bbox": [[12, 18], [198, 52]], "_confidence": 0.9},
})

Testing

43 new tests, all passing.
Full suite: 1119 passed, 2 skipped (baseline 1076 on dev + 43 new), no regressions.
ruff check passes on all changed files.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute
this contribution, under the terms of your choice.

Computes Intersection over Union between two bounding boxes as a similarity score. Supports two-point ([[x1,y1],[x2,y2]]) and flat ([x1,y1,x2,y2]) formats with coordinate normalization. Registered in the comparator registry and exported from stickler.comparators.

Adds mean Average Precision scoring for bounding box localization, built on the rich-value (_bbox) pattern and the PostComparisonAccumulator interface. - MAPCalculator: extract_from_dicts() joins field_comparisons with gt/pred bounding boxes into keyed pairs; compute_metrics() produces per-field IoU/precision/recall/F1/AP plus overall mean AP and coverage. Mirrors ConfidenceCalculator's calculator/accumulator split. - BBoxMAPAccumulator: PostComparisonAccumulator implementation (name 'bbox_map_metrics') for bulk evaluation, with get/load/merge_state for checkpointing and distributed runs. - comparison_engine: pre-extracts ground_truth_bboxes and prediction_bboxes into the result (mirrors prediction_confidences) so the accumulator can join both sides without the model instances; adds a single-document add_bbox_metrics sanity-check path. - structured_model: forwards add_bbox_metrics and bbox_iou_threshold.

43 tests covering BBoxIoUComparator IoU math and edge cases, MAPCalculator extraction/metrics/coverage, BBoxMAPAccumulator accumulation and state round-trip/merge, bulk evaluator integration (including coexistence with ConfidenceAccumulator), and the single-document add_bbox_metrics path.

Adds the Advanced/bbox-map-metrics.md page (accumulator-first usage with the _bbox rich-value key, plus single-document sanity-check path), wires it into the Advanced nav, and documents BBoxIoUComparator in the Comparators guide.

Divya Bhargavi added 4 commits June 8, 2026 09:36

docs: add bounding box mAP metrics documentation

af6e386

Adds the Advanced/bbox-map-metrics.md page (accumulator-first usage with the _bbox rich-value key, plus single-document sanity-check path), wires it into the Advanced nav, and documents BBoxIoUComparator in the Comparators guide.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: mAP scoring for bounding box evaluation (#89)#151

feat: mAP scoring for bounding box evaluation (#89)#151
Divya-Bhargavi wants to merge 4 commits into
awslabs:devfrom
Divya-Bhargavi:dbharga_map_v2

Divya-Bhargavi commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Divya-Bhargavi commented Jun 8, 2026

Issue

Description of changes

What's included

Usage (bulk — recommended)

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant