feat(image-redactor): redact all frames of multi-frame DICOM images by Dashtid · Pull Request #2094 · data-privacy-stack/presidio

Dashtid · 2026-06-23T18:30:40Z

Change Description

DicomImageRedactorEngine previously detected and redacted PII on only the first frame of a multi-frame DICOM instance, and raised ValueError: Too many dimensions: 3 > 2 when redacting multi-frame (e.g. XA) grayscale instances, because the pixel array was passed to Image.fromarray with the frame axis still present.

This makes redaction frame-aware:

Frame count is read from NumberOfFrames (0028,0008); single-frame behavior is unchanged.
PII detection runs per frame. Multi-frame instances can carry burned-in PHI on any frame, and not necessarily in the same location (as raised in Image Redactor - Support Multiframe Image Redaction #1737), so each frame is scanned and redacted independently.
_add_redact_box decodes the pixel data once, applies each frame's boxes to that frame, and writes it back as a single block (this also removes a fragile re-decode that relied on pixel_array caching in the previous in-place mutation).
Per-frame rescaling preserves per-frame contrast.
The DICOM verify engine visualizes a representative frame for multi-frame inputs instead of raising.

This also resolves the Too many dimensions: 3 > 2 crash (#1731).

Prior art

#1735 was an earlier attempt at the #1731 crash and was closed without merging. That approach stopped the exception by processing only the first frame; it was confirmed in-thread that the other frames were still not redacted. This PR instead scans and redacts every frame, which is the behavior requested in #1737.

Tests

Added tests/test_dicom_multiframe_redaction.py:

_get_number_of_frames (multi-frame, single-frame, missing element)
per-frame rescaling returns a stacked uint8 array
_add_redact_box redacts every frame for greyscale and color multi-frame instances, and leaves single-frame behavior unchanged
end-to-end redact() applies redaction to every frame
regression test for the Image Redactor - XA modality - ValueError: Too many dimensions: 3 > 2. #1731 crash

The image-redactor unit suite (200 tests) and ruff pass locally. Integration tests that require a Tesseract binary were not run locally and are left to CI.

Issue reference

Fixes #1737
Fixes #1731

Open questions

Detection runs OCR per frame, which is faithful to Image Redactor - Support Multiframe Image Redaction #1737 but is N x OCR for large cine loops. Would you prefer an optional frame-deduplication step (skip re-OCR on identical frames) or an opt-out for very large series?
For multi-frame inputs the verify engine currently visualizes the first frame rather than returning per-frame results. Is that acceptable, or would per-frame verification output be preferable?

Checklist

I have reviewed the contribution guidelines
I have signed the CLA (if required)
My code includes unit tests
All unit tests and lint checks pass locally
My PR contains documentation updates / additions if required

Copilot

Pull request overview

This PR updates the Presidio image-redactor DICOM pipeline to correctly handle multi-frame DICOM instances by running detection and redaction per frame (instead of only the first frame), preventing the ValueError: Too many dimensions: 3 > 2 failure on multi-frame grayscale XA-like inputs.

Changes:

Added frame-aware detection/redaction in DicomImageRedactorEngine, including per-frame bbox handling and per-frame rescaling.
Updated the DICOM verify engine to visualize a representative frame for multi-frame inputs instead of raising.
Added a focused unit test suite covering multi-frame behavior and the #1731 regression, and documented the change in the changelog.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
presidio-image-redactor/tests/test_dicom_multiframe_redaction.py	New unit tests validating multi-frame frame-by-frame rescaling, bbox application, and end-to-end redaction + regression coverage.
presidio-image-redactor/presidio_image_redactor/dicom_image_redactor_engine.py	Core implementation: frame-aware detection, per-frame rescaling, bbox normalization, and single-pass pixel data mutation for multi-frame DICOM redaction.
presidio-image-redactor/presidio_image_redactor/dicom_image_pii_verify_engine.py	Adjust verification to select a representative frame for multi-frame inputs.
CHANGELOG.md	Documented the new multi-frame DICOM redaction support and referenced the fixed issues.

+        if not is_multiframe:
+            return [bounding_boxes_coordinates or []]
+


+            # Select masking box color per frame so contrast is preserved.
+            frame_override = frame_view if is_multiframe else None
+            if is_greyscale:
+                box_color = cls._get_most_common_pixel_value(
+                    instance, crop_ratio, fill, pixel_array=frame_override
+                )
+            else:
+                box_color = cls._set_bbox_color(
+                    redacted_instance, fill, pixel_array=frame_override
+                )


DicomImageRedactorEngine previously detected and redacted PII on only the first frame of a multi-frame instance, and raised 'ValueError: Too many dimensions: 3 > 2' on multi-frame (e.g. XA) grayscale data because the pixel array was passed to Image.fromarray with a frame axis still present. Detection now runs per frame (multi-frame DICOMs can carry burned-in PHI on any frame, not necessarily in the same location) and the redaction write-back applies each frame's boxes to that frame. Frame count is read from Number of Frames (0028,0008); single-frame behavior is unchanged. - _get_number_of_frames / _image_from_array / _rescale_array helpers - per-frame rescaling preserves per-frame contrast - _add_redact_box decodes pixel data once and writes it back as one block, fixing the fragile re-decode in the previous in-place mutation - verify engine visualizes a representative frame for multi-frame inputs - unit tests for greyscale/color multi-frame redaction and the data-privacy-stack#1731 crash Fixes data-privacy-stack#1737, data-privacy-stack#1731.

Dashtid · 2026-06-25T20:26:44Z

Rebased onto current main to pick up the CI fixes from #2091/#2092.

Also addressed the two automated review points:

_normalize_bboxes_per_frame now accepts the per-frame format ([[...]]) for single-frame instances as the docstring states; previously a nested list passed for a single-frame instance would have failed downstream in _add_redact_box.
_add_redact_box now reuses the already-decoded frame for box-color selection instead of re-decoding instance.pixel_array, so the pixel data is genuinely decoded once.

Added a unit test for the single-frame per-frame-format case. Full image-redactor unit suite (201 tests) and ruff pass locally.

Copilot AI review requested due to automatic review settings June 23, 2026 18:30

github-actions Bot added the external label Jun 23, 2026

Copilot started reviewing on behalf of Dashtid June 23, 2026 18:31 View session

Copilot AI reviewed Jun 23, 2026

View reviewed changes

Dashtid force-pushed the feat/multiframe-dicom-redaction branch from 5a7b41b to 3c2b7f3 Compare June 25, 2026 20:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(image-redactor): redact all frames of multi-frame DICOM images#2094

feat(image-redactor): redact all frames of multi-frame DICOM images#2094
Dashtid wants to merge 1 commit into
data-privacy-stack:mainfrom
Dashtid:feat/multiframe-dicom-redaction

Dashtid commented Jun 23, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Dashtid commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		if not is_multiframe:
		return [bounding_boxes_coordinates or []]

Uh oh!

Conversation

Dashtid commented Jun 23, 2026

Change Description

Prior art

Tests

Issue reference

Open questions

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Dashtid commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants