Skip to content

feat(image-redactor): redact all frames of multi-frame DICOM images#2094

Open
Dashtid wants to merge 1 commit into
data-privacy-stack:mainfrom
Dashtid:feat/multiframe-dicom-redaction
Open

feat(image-redactor): redact all frames of multi-frame DICOM images#2094
Dashtid wants to merge 1 commit into
data-privacy-stack:mainfrom
Dashtid:feat/multiframe-dicom-redaction

Conversation

@Dashtid

@Dashtid Dashtid commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Change Description

DicomImageRedactorEngine previously detected and redacted PII on only the first frame of a multi-frame DICOM instance, and raised ValueError: Too many dimensions: 3 > 2 when redacting multi-frame (e.g. XA) grayscale instances, because the pixel array was passed to Image.fromarray with the frame axis still present.

This makes redaction frame-aware:

  • Frame count is read from NumberOfFrames (0028,0008); single-frame behavior is unchanged.
  • PII detection runs per frame. Multi-frame instances can carry burned-in PHI on any frame, and not necessarily in the same location (as raised in Image Redactor - Support Multiframe Image Redaction #1737), so each frame is scanned and redacted independently.
  • _add_redact_box decodes the pixel data once, applies each frame's boxes to that frame, and writes it back as a single block (this also removes a fragile re-decode that relied on pixel_array caching in the previous in-place mutation).
  • Per-frame rescaling preserves per-frame contrast.
  • The DICOM verify engine visualizes a representative frame for multi-frame inputs instead of raising.

This also resolves the Too many dimensions: 3 > 2 crash (#1731).

Prior art

#1735 was an earlier attempt at the #1731 crash and was closed without merging. That approach stopped the exception by processing only the first frame; it was confirmed in-thread that the other frames were still not redacted. This PR instead scans and redacts every frame, which is the behavior requested in #1737.

Tests

Added tests/test_dicom_multiframe_redaction.py:

  • _get_number_of_frames (multi-frame, single-frame, missing element)
  • per-frame rescaling returns a stacked uint8 array
  • _add_redact_box redacts every frame for greyscale and color multi-frame instances, and leaves single-frame behavior unchanged
  • end-to-end redact() applies redaction to every frame
  • regression test for the Image Redactor - XA modality - ValueError: Too many dimensions: 3 > 2. #1731 crash

The image-redactor unit suite (200 tests) and ruff pass locally. Integration tests that require a Tesseract binary were not run locally and are left to CI.

Issue reference

Fixes #1737
Fixes #1731

Open questions

  • Detection runs OCR per frame, which is faithful to Image Redactor - Support Multiframe Image Redaction #1737 but is N x OCR for large cine loops. Would you prefer an optional frame-deduplication step (skip re-OCR on identical frames) or an opt-out for very large series?
  • For multi-frame inputs the verify engine currently visualizes the first frame rather than returning per-frame results. Is that acceptable, or would per-frame verification output be preferable?

Checklist

  • I have reviewed the contribution guidelines
  • I have signed the CLA (if required)
  • My code includes unit tests
  • All unit tests and lint checks pass locally
  • My PR contains documentation updates / additions if required

Copilot AI review requested due to automatic review settings June 23, 2026 18:30

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the Presidio image-redactor DICOM pipeline to correctly handle multi-frame DICOM instances by running detection and redaction per frame (instead of only the first frame), preventing the ValueError: Too many dimensions: 3 > 2 failure on multi-frame grayscale XA-like inputs.

Changes:

  • Added frame-aware detection/redaction in DicomImageRedactorEngine, including per-frame bbox handling and per-frame rescaling.
  • Updated the DICOM verify engine to visualize a representative frame for multi-frame inputs instead of raising.
  • Added a focused unit test suite covering multi-frame behavior and the #1731 regression, and documented the change in the changelog.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
presidio-image-redactor/tests/test_dicom_multiframe_redaction.py New unit tests validating multi-frame frame-by-frame rescaling, bbox application, and end-to-end redaction + regression coverage.
presidio-image-redactor/presidio_image_redactor/dicom_image_redactor_engine.py Core implementation: frame-aware detection, per-frame rescaling, bbox normalization, and single-pass pixel data mutation for multi-frame DICOM redaction.
presidio-image-redactor/presidio_image_redactor/dicom_image_pii_verify_engine.py Adjust verification to select a representative frame for multi-frame inputs.
CHANGELOG.md Documented the new multi-frame DICOM redaction support and referenced the fixed issues.

Comment on lines +945 to +947
if not is_multiframe:
return [bounding_boxes_coordinates or []]

Comment on lines +1012 to +1021
# Select masking box color per frame so contrast is preserved.
frame_override = frame_view if is_multiframe else None
if is_greyscale:
box_color = cls._get_most_common_pixel_value(
instance, crop_ratio, fill, pixel_array=frame_override
)
else:
box_color = cls._set_bbox_color(
redacted_instance, fill, pixel_array=frame_override
)
DicomImageRedactorEngine previously detected and redacted PII on only the
first frame of a multi-frame instance, and raised 'ValueError: Too many
dimensions: 3 > 2' on multi-frame (e.g. XA) grayscale data because the
pixel array was passed to Image.fromarray with a frame axis still present.

Detection now runs per frame (multi-frame DICOMs can carry burned-in PHI
on any frame, not necessarily in the same location) and the redaction
write-back applies each frame's boxes to that frame. Frame count is read
from Number of Frames (0028,0008); single-frame behavior is unchanged.

- _get_number_of_frames / _image_from_array / _rescale_array helpers
- per-frame rescaling preserves per-frame contrast
- _add_redact_box decodes pixel data once and writes it back as one block,
  fixing the fragile re-decode in the previous in-place mutation
- verify engine visualizes a representative frame for multi-frame inputs
- unit tests for greyscale/color multi-frame redaction and the data-privacy-stack#1731 crash

Fixes data-privacy-stack#1737, data-privacy-stack#1731.
@Dashtid Dashtid force-pushed the feat/multiframe-dicom-redaction branch from 5a7b41b to 3c2b7f3 Compare June 25, 2026 20:26
@Dashtid

Dashtid commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

Rebased onto current main to pick up the CI fixes from #2091/#2092.

Also addressed the two automated review points:

  • _normalize_bboxes_per_frame now accepts the per-frame format ([[...]]) for single-frame instances as the docstring states; previously a nested list passed for a single-frame instance would have failed downstream in _add_redact_box.
  • _add_redact_box now reuses the already-decoded frame for box-color selection instead of re-decoding instance.pixel_array, so the pixel data is genuinely decoded once.

Added a unit test for the single-frame per-frame-format case. Full image-redactor unit suite (201 tests) and ruff pass locally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Image Redactor - Support Multiframe Image Redaction Image Redactor - XA modality - ValueError: Too many dimensions: 3 > 2.

2 participants