Skip to content

feat: mAP scoring for bounding box evaluation (#89)#151

Open
Divya-Bhargavi wants to merge 4 commits into
awslabs:devfrom
Divya-Bhargavi:dbharga_map_v2
Open

feat: mAP scoring for bounding box evaluation (#89)#151
Divya-Bhargavi wants to merge 4 commits into
awslabs:devfrom
Divya-Bhargavi:dbharga_map_v2

Conversation

@Divya-Bhargavi

Copy link
Copy Markdown

Issue

Closes #89

Description of changes

Adds Mean Average Precision (mAP) scoring for bounding box evaluation within the stickler
framework, enabling users to measure how accurately a model locates information on a page.

This builds on the rich-value pattern and the PostComparisonAccumulator aggregate
infrastructure introduced in #98 (now merged into dev / released in v0.4.0), so it targets
dev directly.

What's included

  • BBoxIoUComparator — computes Intersection over Union between two bounding boxes.
    Supports [[x1,y1],[x2,y2]] and [x1,y1,x2,y2] formats with coordinate normalization.
    Registered in the comparator registry and exported from stickler.comparators.
  • MAPCalculator — mirrors ConfidenceCalculator's calculator/accumulator split:
    extract_from_dicts() joins field_comparisons with ground-truth and prediction boxes
    into keyed pairs; compute_metrics() produces per-field IoU/precision/recall/F1/AP plus
    overall mean AP and coverage.
  • BBoxMAPAccumulator — a PostComparisonAccumulator (name bbox_map_metrics) for bulk
    evaluation, with get_state/load_state/merge_state for checkpointing and distributed
    runs. The base-class docstring already reserved this name for the planned implementation.
  • Rich-value _bbox extraction — bounding boxes ride on the underscore rich-value
    pattern (_value/_confidence/_bbox) and land in field extras via get_all_extras();
    no changes to the rich-value helper were needed.
  • compare_with() wiring — pre-extracts ground_truth_bboxes and prediction_bboxes
    into the result (mirroring prediction_confidences) so the accumulator can join both
    sides. A single-document add_bbox_metrics=True sanity-check path is also provided.
  • 43 tests — comparator math, calculator logic, accumulator + state round-trip/merge,
    bulk integration (including coexistence with ConfidenceAccumulator), single-doc path.
  • Documentationdocs/docs/Advanced/bbox-map-metrics.md and an updated comparators
    guide.

Usage (bulk — recommended)

from stickler.structured_object_evaluator.bulk_structured_model_evaluator import (
    BulkStructuredModelEvaluator,
)
from stickler.structured_object_evaluator.models.bbox_map_accumulator import (
    BBoxMAPAccumulator,
)

evaluator = BulkStructuredModelEvaluator(
    accumulators=[BBoxMAPAccumulator(iou_threshold=0.5)],
)
for gt, pred in dataset:
    evaluator.update(gt, pred)

metrics = evaluator.compute().accumulator_metrics["bbox_map_metrics"]
print(metrics["mean_ap"])

Bounding boxes are provided via the rich-value pattern:

gt = Invoice.from_json({
    "vendor": {"_value": "Acme", "_bbox": [[10, 20], [200, 50]]},
})
pred = Invoice.from_json({
    "vendor": {"_value": "Acme", "_bbox": [[12, 18], [198, 52]], "_confidence": 0.9},
})

Testing

  • 43 new tests, all passing.
  • Full suite: 1119 passed, 2 skipped (baseline 1076 on dev + 43 new), no regressions.
  • ruff check passes on all changed files.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute
this contribution, under the terms of your choice.

Divya Bhargavi added 4 commits June 8, 2026 09:36
Computes Intersection over Union between two bounding boxes as a
similarity score. Supports two-point ([[x1,y1],[x2,y2]]) and flat
([x1,y1,x2,y2]) formats with coordinate normalization. Registered in
the comparator registry and exported from stickler.comparators.
Adds mean Average Precision scoring for bounding box localization,
built on the rich-value (_bbox) pattern and the PostComparisonAccumulator
interface.

- MAPCalculator: extract_from_dicts() joins field_comparisons with gt/pred
  bounding boxes into keyed pairs; compute_metrics() produces per-field
  IoU/precision/recall/F1/AP plus overall mean AP and coverage. Mirrors
  ConfidenceCalculator's calculator/accumulator split.
- BBoxMAPAccumulator: PostComparisonAccumulator implementation
  (name 'bbox_map_metrics') for bulk evaluation, with get/load/merge_state
  for checkpointing and distributed runs.
- comparison_engine: pre-extracts ground_truth_bboxes and prediction_bboxes
  into the result (mirrors prediction_confidences) so the accumulator can
  join both sides without the model instances; adds a single-document
  add_bbox_metrics sanity-check path.
- structured_model: forwards add_bbox_metrics and bbox_iou_threshold.
43 tests covering BBoxIoUComparator IoU math and edge cases,
MAPCalculator extraction/metrics/coverage, BBoxMAPAccumulator
accumulation and state round-trip/merge, bulk evaluator integration
(including coexistence with ConfidenceAccumulator), and the
single-document add_bbox_metrics path.
Adds the Advanced/bbox-map-metrics.md page (accumulator-first usage with
the _bbox rich-value key, plus single-document sanity-check path), wires
it into the Advanced nav, and documents BBoxIoUComparator in the
Comparators guide.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE]: MAP scoring for bounding box estimation

1 participant