feat(image-redactor): redact all frames of multi-frame DICOM images#2094
Open
Dashtid wants to merge 1 commit into
Open
feat(image-redactor): redact all frames of multi-frame DICOM images#2094Dashtid wants to merge 1 commit into
Dashtid wants to merge 1 commit into
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates the Presidio image-redactor DICOM pipeline to correctly handle multi-frame DICOM instances by running detection and redaction per frame (instead of only the first frame), preventing the ValueError: Too many dimensions: 3 > 2 failure on multi-frame grayscale XA-like inputs.
Changes:
- Added frame-aware detection/redaction in
DicomImageRedactorEngine, including per-frame bbox handling and per-frame rescaling. - Updated the DICOM verify engine to visualize a representative frame for multi-frame inputs instead of raising.
- Added a focused unit test suite covering multi-frame behavior and the #1731 regression, and documented the change in the changelog.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| presidio-image-redactor/tests/test_dicom_multiframe_redaction.py | New unit tests validating multi-frame frame-by-frame rescaling, bbox application, and end-to-end redaction + regression coverage. |
| presidio-image-redactor/presidio_image_redactor/dicom_image_redactor_engine.py | Core implementation: frame-aware detection, per-frame rescaling, bbox normalization, and single-pass pixel data mutation for multi-frame DICOM redaction. |
| presidio-image-redactor/presidio_image_redactor/dicom_image_pii_verify_engine.py | Adjust verification to select a representative frame for multi-frame inputs. |
| CHANGELOG.md | Documented the new multi-frame DICOM redaction support and referenced the fixed issues. |
Comment on lines
+945
to
+947
| if not is_multiframe: | ||
| return [bounding_boxes_coordinates or []] | ||
|
|
Comment on lines
+1012
to
+1021
| # Select masking box color per frame so contrast is preserved. | ||
| frame_override = frame_view if is_multiframe else None | ||
| if is_greyscale: | ||
| box_color = cls._get_most_common_pixel_value( | ||
| instance, crop_ratio, fill, pixel_array=frame_override | ||
| ) | ||
| else: | ||
| box_color = cls._set_bbox_color( | ||
| redacted_instance, fill, pixel_array=frame_override | ||
| ) |
DicomImageRedactorEngine previously detected and redacted PII on only the first frame of a multi-frame instance, and raised 'ValueError: Too many dimensions: 3 > 2' on multi-frame (e.g. XA) grayscale data because the pixel array was passed to Image.fromarray with a frame axis still present. Detection now runs per frame (multi-frame DICOMs can carry burned-in PHI on any frame, not necessarily in the same location) and the redaction write-back applies each frame's boxes to that frame. Frame count is read from Number of Frames (0028,0008); single-frame behavior is unchanged. - _get_number_of_frames / _image_from_array / _rescale_array helpers - per-frame rescaling preserves per-frame contrast - _add_redact_box decodes pixel data once and writes it back as one block, fixing the fragile re-decode in the previous in-place mutation - verify engine visualizes a representative frame for multi-frame inputs - unit tests for greyscale/color multi-frame redaction and the data-privacy-stack#1731 crash Fixes data-privacy-stack#1737, data-privacy-stack#1731.
5a7b41b to
3c2b7f3
Compare
Contributor
Author
|
Rebased onto current main to pick up the CI fixes from #2091/#2092. Also addressed the two automated review points:
Added a unit test for the single-frame per-frame-format case. Full image-redactor unit suite (201 tests) and ruff pass locally. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Change Description
DicomImageRedactorEnginepreviously detected and redacted PII on only the first frame of a multi-frame DICOM instance, and raisedValueError: Too many dimensions: 3 > 2when redacting multi-frame (e.g. XA) grayscale instances, because the pixel array was passed toImage.fromarraywith the frame axis still present.This makes redaction frame-aware:
NumberOfFrames(0028,0008); single-frame behavior is unchanged._add_redact_boxdecodes the pixel data once, applies each frame's boxes to that frame, and writes it back as a single block (this also removes a fragile re-decode that relied onpixel_arraycaching in the previous in-place mutation).This also resolves the
Too many dimensions: 3 > 2crash (#1731).Prior art
#1735 was an earlier attempt at the #1731 crash and was closed without merging. That approach stopped the exception by processing only the first frame; it was confirmed in-thread that the other frames were still not redacted. This PR instead scans and redacts every frame, which is the behavior requested in #1737.
Tests
Added
tests/test_dicom_multiframe_redaction.py:_get_number_of_frames(multi-frame, single-frame, missing element)_add_redact_boxredacts every frame for greyscale and color multi-frame instances, and leaves single-frame behavior unchangedredact()applies redaction to every frameThe image-redactor unit suite (200 tests) and
ruffpass locally. Integration tests that require a Tesseract binary were not run locally and are left to CI.Issue reference
Fixes #1737
Fixes #1731
Open questions
Checklist