Conversation
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: AdamBajger <33228382+AdamBajger@users.noreply.github.com>
Co-authored-by: AdamBajger <33228382+AdamBajger@users.noreply.github.com>
Co-authored-by: AdamBajger <33228382+AdamBajger@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: AdamBajger <33228382+AdamBajger@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
Added type ignore comment for return value.
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (15)
📝 WalkthroughWalkthroughAdds a composable MaskBuilder system with storage backends (inmemory, memmap), aggregators (mean, max), preprocessors (edge clipping, autoscaling, scalar expansion), docs/tests, a BigTIFF writer, typing modernizations, tooling/.gitignore tweaks, and a safely_instantiate helper. Changes
Sequence Diagram(s)sequenceDiagram
participant User as User
participant MB as MaskBuilder
participant PRE as Preprocessor
participant AGG as Aggregator
participant ST as Storage
User->>MB: __init__(shape, storage, aggregation, preprocessors)
MB->>ST: allocate accumulator
ST-->>MB: accumulator
loop for each data batch
User->>MB: update_batch(data_batch, coords_batch)
MB->>PRE: process(data_batch, coords_batch)
PRE-->>MB: data', coords'
MB->>AGG: update(accumulator, data', coords')
AGG->>ST: read/write accumulator slices
ST-->>AGG: done
AGG-->>MB: updated
end
User->>MB: finalize()
MB->>AGG: finalize(accumulator)
AGG-->>MB: final_mask
MB-->>User: result
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a flexible and modular system for constructing feature masks from neural network predictions, particularly useful for tiled data like whole-slide images. This system streamlines the process of combining individual tile predictions into a cohesive, full-resolution mask by offering configurable strategies for memory management, overlap resolution, and data preprocessing. It aims to enhance the efficiency and adaptability of mask generation workflows. Highlights
Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces an excellent, modular mask builder system using a strategy-based pattern for composition. The new components for storage, aggregation, and preprocessing are well-designed and accompanied by thorough documentation and tests.
My main feedback concerns resource management. The current implementation relies on __del__ for cleaning up temporary files in MemMapStorage, which can be unreliable. Specifically, the MeanAggregator can leak temporary files for its overlap counter. I've suggested adding an explicit cleanup() mechanism across the components and implementing the context manager protocol in MaskBuilder for deterministic resource deallocation. I also found a related weakness in the tests for file cleanup that should be addressed.
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
ratiopath/tiling/overlays.py (1)
229-278:⚠️ Potential issue | 🟡 MinorChange the return type annotation from
pa.MapArraytopa.StructArrayand remove thetype: ignorecomment.
pyarrow.array()from Python dicts creates aStructArray(with fixed fields per row), not aMapArray. PyArrow'sMapArrayrequires an explicitpa.map_(...)type and list-of-tuples input structure. The currenttype: ignore[return-value]masks a real type mismatch; update the annotation topa.StructArrayto reflect what the code actually returns.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ratiopath/tiling/overlays.py` around lines 229 - 278, Update the function signature and return typing for tile_overlay_overlap: change the annotated return type from pa.MapArray to pa.StructArray (i.e., def tile_overlay_overlap(...) -> pa.StructArray) and remove the trailing " # type: ignore[return-value]" on the final return; ensure any dependent references to the function's return type reflect pa.StructArray. Keep the body (overlap_fraction, overlap_vectorized, and pa.array(...) return) unchanged so pyarrow-produced StructArray matches the annotation.
🧹 Nitpick comments (6)
ratiopath/masks/mask_builders/storage.py (2)
65-72: Flush the memmap before closing it.NumPy documents
flush()as the public way to persist memmap changes, and also notes there is no public API to close the underlying mmap. Skippingflush()here makes persistence depend on the private_mmapinternals. (numpy.org)Suggested change
def cleanup(self) -> None: if hasattr(self, "accumulator"): try: + self.accumulator.flush() self.accumulator._mmap.close() # type: ignore[attr-defined] except Exception as e: logger.warning(f"Failed to close memmap: {e}") finally: del self.accumulator🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ratiopath/masks/mask_builders/storage.py` around lines 65 - 72, In cleanup(), call the public flush() on the memmap object before touching its private _mmap: if self.accumulator has a flush attribute, invoke self.accumulator.flush() to persist changes, then proceed to close the underlying self.accumulator._mmap inside the existing try/except/finally; ensure exceptions from flush are caught and logged similarly to the current close error handling, and still delete self.accumulator in the finally block.
1-14: Add a module docstring for this public API module.This file introduces exported storage backends, so a short module-level description will help the generated API docs.
Suggested change
+"""Storage backends for mask-builder accumulators.""" + import loggingAs per coding guidelines, "Add documentation for any new public functions, classes, or modules you create".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ratiopath/masks/mask_builders/storage.py` around lines 1 - 14, Add a concise module-level docstring at the top of the module describing its purpose (exposed storage backend abstractions and allocators) and intended public API for documentation generation; place the docstring before imports and mention that the module defines StorageAllocator (and any concrete storage backends) for allocating numpy-backed storage, so readers of the generated docs understand the module responsibility and exported symbols like StorageAllocator.ratiopath/masks/write_big_tiff.py (1)
1-6: Add a module docstring for this new public module.The function is documented, but the module itself is new and public. A short top-level docstring will make the generated API page much less bare.
Suggested change
+"""Utilities for exporting masks as tiled pyramidal BigTIFF files.""" + from pathlib import PathAs per coding guidelines, "Add documentation for any new public functions, classes, or modules you create".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ratiopath/masks/write_big_tiff.py` around lines 1 - 6, Add a short top-level module docstring to ratiopath/masks/write_big_tiff.py that briefly describes the module's purpose (writing large TIFFs using pyvips) and mentions the public API (the write_big_tiff function); place it at the very top of the file as a one- or two-sentence docstring so the generated docs are informative and follow the project documentation guideline.tests/conftest.py (1)
1-8: Consider usingnp.random.default_rng()instead of deprecatednp.random.seed().The legacy
np.random.seed()andnp.random.rand()APIs are discouraged in favor of the newerGeneratorAPI. While this works, the modern approach provides better reproducibility guarantees and thread safety.♻️ Suggested refactor using modern NumPy RNG
import numpy as np import pytest `@pytest.fixture`(autouse=True) def reset_random_seed(): """Reset random seed before each test for deterministic behavior.""" - np.random.seed(42) + # Seed the legacy global RNG for backwards compatibility with tests using np.random.rand() + np.random.seed(42)Alternatively, if you update the tests to use the Generator API, you could inject the RNG:
`@pytest.fixture` def rng(): """Provide a seeded random generator for deterministic tests.""" return np.random.default_rng(42)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/conftest.py` around lines 1 - 8, The reset_random_seed fixture uses the legacy global RNG via np.random.seed(42); replace it with a modern Generator-based fixture: remove or stop using the autouse reset_random_seed fixture and add a fixture named rng that returns np.random.default_rng(42) (function name: rng, symbol: np.random.default_rng) so tests can accept an rng argument and use its methods (e.g., rng.random(), rng.integers()) instead of global np.random.*; update tests to consume the new rng fixture where deterministic randomness is needed.ratiopath/masks/mask_builders/__init__.py (1)
1-32: Consider adding a module docstring for API documentation.A brief module-level docstring would improve discoverability and help mkdocstrings generate better documentation for the package.
📝 Suggested addition
+"""Mask builder components for assembling feature masks from tiled data. + +This package provides a composable system for building large masks from +neural network predictions or other tile-level data, with pluggable +storage, aggregation, and preprocessing strategies. +""" + from ratiopath.masks.mask_builders.aggregation import ( Aggregator, MaxAggregator,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ratiopath/masks/mask_builders/__init__.py` around lines 1 - 32, Add a concise module-level docstring at the top of ratiopath/masks/mask_builders/__init__.py that summarizes the package purpose (helpers to build and manage masks, preprocessors, aggregators, and storage backends) and optionally lists the main public symbols exported (e.g., Aggregator, MaxAggregator, MeanAggregator, MaskBuilder, Preprocessor, AutoScalingPreprocessor, EdgeClippingPreprocessor, ScalarUniformExpansionPreprocessor, MemMapStorage, NumpyStorage, StorageAllocator) so mkdocstrings can generate clear API docs; place it above the imports as the first statement and keep it short (1–3 sentences).ratiopath/masks/mask_builders/aggregation.py (1)
52-104: Consider cleanup foroverlap_storageinMeanAggregator.When
MeanAggregatoris used withMemMapStorage, theoverlap_storagecreates its own memmap file. If the main storage is cleaned up but the aggregator isn't, the overlap file may not be properly deleted. Consider adding a cleanup method or ensuring the overlap storage is cleaned up when the aggregator is destroyed.♻️ Suggested addition
class MeanAggregator[DType: np.generic](Aggregator[DType]): """Aggregator that implements averaging aggregation for overlapping tiles.""" + def cleanup(self) -> None: + """Clean up the overlap storage resources.""" + if hasattr(self, "overlap_storage"): + self.overlap_storage.cleanup() + def __init__( self,Then in
MaskBuilder, you could callself.aggregator.cleanup()if the method exists.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ratiopath/masks/mask_builders/aggregation.py` around lines 52 - 104, MeanAggregator currently creates a secondary storage object (self.overlap_storage) that may open its own file (e.g., with MemMapStorage) and never gets cleaned up; add a cleanup method on MeanAggregator that calls the underlying storage's close/delete method (e.g., call self.overlap_storage.close() or self.overlap_storage.cleanup() depending on StorageAllocator API) and ensure any file handles are released and files removed when appropriate, and update MaskBuilder to call self.aggregator.cleanup() (guarded by hasattr) when the main storage is cleaned up or the builder is destroyed so the overlap memmap file is removed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/reference/masks/mask_builders.md`:
- Around line 135-151: The example using AutoScalingPreprocessor and MaskBuilder
calls np.array(...) but omits the numpy import; add "import numpy as np" at the
top of the snippet so np is defined before AutoScalingPreprocessor and
MaskBuilder are used (fix the example block that constructs auto_scale with
source_extents/source_tile_extents/source_tile_strides using np.array).
In `@ratiopath/masks/mask_builders/receptive_field_manipulation.py`:
- Around line 108-116: get_vips_scale_factors currently returns
tuple(scale_factors) which yields tuple[Any, ...] and can break the annotated
return type tuple[float, float]; change the return to explicitly produce a tuple
of two floats by converting each element to float (e.g., build tuple(float(x)
for x in scale_factors) or otherwise cast both elements to float) so the
returned value matches the annotated signature for get_vips_scale_factors.
In `@ratiopath/masks/write_big_tiff.py`:
- Around line 6-13: Validate input parameters at the start of write_big_tiff:
check that mpp_x and mpp_y are > 0 and that tile_width and tile_height are
positive integers; if any check fails, raise a ValueError with a clear message
identifying the bad parameter(s). Add the same upfront guards to the other
exported function(s) handling these parameters (lines ~38-39) so callers get an
immediate, descriptive error instead of an internal division/usage failure.
---
Outside diff comments:
In `@ratiopath/tiling/overlays.py`:
- Around line 229-278: Update the function signature and return typing for
tile_overlay_overlap: change the annotated return type from pa.MapArray to
pa.StructArray (i.e., def tile_overlay_overlap(...) -> pa.StructArray) and
remove the trailing " # type: ignore[return-value]" on the final return; ensure
any dependent references to the function's return type reflect pa.StructArray.
Keep the body (overlap_fraction, overlap_vectorized, and pa.array(...) return)
unchanged so pyarrow-produced StructArray matches the annotation.
---
Nitpick comments:
In `@ratiopath/masks/mask_builders/__init__.py`:
- Around line 1-32: Add a concise module-level docstring at the top of
ratiopath/masks/mask_builders/__init__.py that summarizes the package purpose
(helpers to build and manage masks, preprocessors, aggregators, and storage
backends) and optionally lists the main public symbols exported (e.g.,
Aggregator, MaxAggregator, MeanAggregator, MaskBuilder, Preprocessor,
AutoScalingPreprocessor, EdgeClippingPreprocessor,
ScalarUniformExpansionPreprocessor, MemMapStorage, NumpyStorage,
StorageAllocator) so mkdocstrings can generate clear API docs; place it above
the imports as the first statement and keep it short (1–3 sentences).
In `@ratiopath/masks/mask_builders/aggregation.py`:
- Around line 52-104: MeanAggregator currently creates a secondary storage
object (self.overlap_storage) that may open its own file (e.g., with
MemMapStorage) and never gets cleaned up; add a cleanup method on MeanAggregator
that calls the underlying storage's close/delete method (e.g., call
self.overlap_storage.close() or self.overlap_storage.cleanup() depending on
StorageAllocator API) and ensure any file handles are released and files removed
when appropriate, and update MaskBuilder to call self.aggregator.cleanup()
(guarded by hasattr) when the main storage is cleaned up or the builder is
destroyed so the overlap memmap file is removed.
In `@ratiopath/masks/mask_builders/storage.py`:
- Around line 65-72: In cleanup(), call the public flush() on the memmap object
before touching its private _mmap: if self.accumulator has a flush attribute,
invoke self.accumulator.flush() to persist changes, then proceed to close the
underlying self.accumulator._mmap inside the existing try/except/finally; ensure
exceptions from flush are caught and logged similarly to the current close error
handling, and still delete self.accumulator in the finally block.
- Around line 1-14: Add a concise module-level docstring at the top of the
module describing its purpose (exposed storage backend abstractions and
allocators) and intended public API for documentation generation; place the
docstring before imports and mention that the module defines StorageAllocator
(and any concrete storage backends) for allocating numpy-backed storage, so
readers of the generated docs understand the module responsibility and exported
symbols like StorageAllocator.
In `@ratiopath/masks/write_big_tiff.py`:
- Around line 1-6: Add a short top-level module docstring to
ratiopath/masks/write_big_tiff.py that briefly describes the module's purpose
(writing large TIFFs using pyvips) and mentions the public API (the
write_big_tiff function); place it at the very top of the file as a one- or
two-sentence docstring so the generated docs are informative and follow the
project documentation guideline.
In `@tests/conftest.py`:
- Around line 1-8: The reset_random_seed fixture uses the legacy global RNG via
np.random.seed(42); replace it with a modern Generator-based fixture: remove or
stop using the autouse reset_random_seed fixture and add a fixture named rng
that returns np.random.default_rng(42) (function name: rng, symbol:
np.random.default_rng) so tests can accept an rng argument and use its methods
(e.g., rng.random(), rng.integers()) instead of global np.random.*; update tests
to consume the new rng fixture where deterministic randomness is needed.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 55d45655-b538-4fee-8b95-bff94d167ec5
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (17)
.gitignore.ruff.tomldocs/reference/masks/mask_builders.mdmkdocs.ymlratiopath/masks/mask_builders/__init__.pyratiopath/masks/mask_builders/aggregation.pyratiopath/masks/mask_builders/mask_builder.pyratiopath/masks/mask_builders/receptive_field_manipulation.pyratiopath/masks/mask_builders/storage.pyratiopath/masks/vips_filters/typed.pyratiopath/masks/write_big_tiff.pyratiopath/model_selection/split.pyratiopath/parsers/geojson_parser.pyratiopath/tiling/overlays.pyratiopath/tiling/tilers.pytests/conftest.pytests/test_mask_builders.py
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a well-designed and flexible mask builder system using a composition-over-inheritance approach, which is excellent. The new components are well-documented and accompanied by a comprehensive test suite. My feedback focuses on improving adherence to Python's standard naming conventions for better readability, making resource cleanup more deterministic, and correcting a minor issue in the documentation examples.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 10
🧹 Nitpick comments (1)
ratiopath/masks/mask_builders/aggregation.py (1)
95-95: Consider widening overlap counter dtype for high-overlap workloads.Using
np.uint16can overflow above 65,535 overlaps per pixel, which would silently skew mean normalization.🔧 Optional hardening
- dtype=np.uint16, + dtype=np.uint32,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ratiopath/masks/mask_builders/aggregation.py` at line 95, The overlap counter currently uses dtype=np.uint16 which can overflow for >65,535 overlaps per pixel; update the allocation for the overlap counter in ratiopath/masks/mask_builders/aggregation.py to use a wider integer type (e.g., np.uint32 or np.uint64) instead of np.uint16 so mean normalization cannot be silently corrupted; change the dtype argument where the overlap/count array is created (the array currently constructed with dtype=np.uint16) to a wider dtype and, optionally, make the dtype configurable if high-overlap workloads are expected.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.ruff.toml:
- Line 40: Remove "N801" from the global ignore list in .ruff.toml and instead
add targeted inline suppressions for the intentionally lowercase classes
inmemory and memmap by appending "# noqa: N801" to their class definitions in
ratiopath/masks/mask_builders/storage.py; also add a short comment next to each
noqa explaining the NumPy-style naming intent so the exceptions are explicit and
localized.
In `@docs/reference/masks/mask_builders.md`:
- Line 113: The example passes an unsupported argument to
EdgeClippingPreprocessor; remove the invalid num_dims=2 from the constructor
call (change clip_prep = EdgeClippingPreprocessor(px_to_clip=4, num_dims=2) to
clip_prep = EdgeClippingPreprocessor(px_to_clip=4)) so the example uses the
actual EdgeClippingPreprocessor signature; update any surrounding text if you
intended to convey dimensionality using a different API or parameter.
In `@ratiopath/masks/mask_builders/aggregation.py`:
- Around line 99-115: The update method uses slices from compute_acc_slices and
directly indexes into accumulator and self.overlap_counter which allows negative
or out-of-range coords to silently wrap or overflow; before applying
accumulator[:, *acc_slices] and self.overlap_counter[:, *acc_slices] validate
that each slice start >= 0 and end <= accumulator.shape for the corresponding
dimensions (or clamp to valid bounds) and raise a clear exception if
coords_batch would write outside the valid extent; perform the same bounds
checks/clamping for the other update path referenced (the second update method
around lines 139-152) and reference compute_acc_slices, accumulator,
overlap_counter, coords_batch, and data_batch when adding the checks so the
guard rails are applied in both update methods.
- Around line 123-125: The cleanup method currently only deletes
self.overlap_counter; if overlap storage is memmap-backed you must explicitly
flush/close it before dropping the reference. Update cleanup (method cleanup) to
check for hasattr(self, "overlap_counter") and if that object exposes memmap
semantics (e.g. is instance of numpy.memmap or has .flush()/.close() methods),
call the appropriate cleanup (call .flush() then .close() or .mmap.close() as
applicable) and only then delete self.overlap_counter to ensure the underlying
memory-mapped file is released.
In `@ratiopath/masks/mask_builders/mask_builder.py`:
- Around line 117-121: The cleanup() method should explicitly invoke backend
cleanup/close methods instead of just deleting the attribute; update cleanup()
to, when self has a storage attribute, call any available teardown method on it
(e.g., storage.cleanup() or storage.close() or storage.release()) via getattr
and only then del self.storage, and ensure self.aggregator.cleanup() is called
after the storage backend has been properly torn down; reference the cleanup()
method, the storage attribute, and self.aggregator.cleanup to locate where to
change the sequencing.
- Around line 73-75: The code assigns any ndarray to self.storage in the
MaskBuilder class without verifying it's compatible with the shape expected
after setup(), which can cause downstream indexing errors; update the assignment
in mask_builders/mask_builder.py (where self.storage is set) to validate the
provided ndarray's ndim and shape against the MaskBuilder's expected storage
shape (the shape computed/used by setup() or the internal shape properties), and
if it doesn't match raise a clear ValueError; ensure validation references the
class/methods (MaskBuilder, setup, self.storage) so callers supplying external
storage get immediate, descriptive feedback instead of later slice/index errors.
In `@ratiopath/masks/mask_builders/receptive_field_manipulation.py`:
- Around line 151-155: The computation of total_strides can become negative when
source_extents is smaller than self.source_tile_extents; update the calculation
around total_strides so you clamp the stride counts to zero before/after
applying ceil to avoid negative values that shrink
overflow_buffered_source_extents and mask_extents. Concretely, compute the
difference (source_extents - self.source_tile_extents) / source_tile_strides,
ensure negative results are set to 0 (e.g., via np.maximum(..., 0)) and then
apply np.ceil and astype(np.int64) to produce non-negative total_strides used
downstream by overflow_buffered_source_extents and mask_extents.
- Around line 57-89: Before generating slices in process, validate
self.clip_start_indices and self.clip_end_indices against the input extents:
ensure all clip_start_indices and clip_end_indices are non-negative integers,
that clip_start_indices[i] + clip_end_indices[i] <= extents[i] for every
dimension, and that resulting slice ranges produce start < end; if any check
fails raise ValueError with a clear message. Implement these guards at the start
of process (before computing slices/adjusted_coords_batch) using extents =
np.asarray(data_batch.shape[2:], dtype=np.int64) and the existing
self.clip_start_indices/self.clip_end_indices arrays so you prevent negative or
over‑clipping for data_batch/coords_batch.
In `@ratiopath/masks/mask_builders/storage.py`:
- Around line 27-35: The docs refer to the constructor argument as filepath but
the public API actually accepts filename; update the docstring text in
storage.py to consistently use the parameter name filename (and any example
usages) and adjust the description about persistence and FileExistsError to
reference filename so copy-pasted code matches the constructor signature (ensure
references to FileExistsError remain accurate for the filename-backed case).
In `@ratiopath/misc.py`:
- Around line 6-19: Add a Google-style docstring to the public helper function
safely_instantiate explaining its purpose (instantiate a callable/class with
provided args/kwargs), describe the kwargs filtering behavior (if the target
accepts **kwargs it forwards all kwargs; otherwise it filters kwargs to only
parameters present in inspect.signature), document the return value and types
(returns an instance of T, cast from Callable[..., T]), mention that positional
args are passed through as-is and that this helper does not modify args, and
include brief examples or notes on edge cases (e.g., how it handles unexpected
kwargs and the use of inspect.Parameter.VAR_KEYWORD) to satisfy public API
documentation guidelines.
---
Nitpick comments:
In `@ratiopath/masks/mask_builders/aggregation.py`:
- Line 95: The overlap counter currently uses dtype=np.uint16 which can overflow
for >65,535 overlaps per pixel; update the allocation for the overlap counter in
ratiopath/masks/mask_builders/aggregation.py to use a wider integer type (e.g.,
np.uint32 or np.uint64) instead of np.uint16 so mean normalization cannot be
silently corrupted; change the dtype argument where the overlap/count array is
created (the array currently constructed with dtype=np.uint16) to a wider dtype
and, optionally, make the dtype configurable if high-overlap workloads are
expected.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4715eb12-6316-472e-9d8e-af4c3aa0edbd
📒 Files selected for processing (9)
.ruff.tomldocs/reference/masks/mask_builders.mdratiopath/masks/mask_builders/__init__.pyratiopath/masks/mask_builders/aggregation.pyratiopath/masks/mask_builders/mask_builder.pyratiopath/masks/mask_builders/receptive_field_manipulation.pyratiopath/masks/mask_builders/storage.pyratiopath/misc.pytests/test_mask_builders.py
🚧 Files skipped from review as they are similar to previous changes (1)
- ratiopath/masks/mask_builders/init.py
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/test_mask_builders.py`:
- Around line 351-390: The test test_numpy_memmap_persistent_file currently
never calls MaskBuilder.finalize(), so it verifies persistence only after object
deletion; update the test to explicitly call assembler.finalize() (on the
MaskBuilder instance named assembler) before deleting the object (del assembler)
and then assert that filename and the overlaps file exist; ensure cleanup still
runs after those assertions. This will make the test validate persistence upon
finalization rather than relying solely on garbage-collection/deletion behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e5be22d1-1d8f-4204-bbf5-458f30ad032a
📒 Files selected for processing (1)
tests/test_mask_builders.py
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a new MaskBuilder module for assembling large masks from tiled data, supporting various storage (in-memory, memory-mapped), aggregation (mean, max), and transformation strategies (edge clipping, auto-scaling, scalar expansion). It also updates the project version to 1.3.0, migrates type aliases to Python 3.12 syntax, and adds a new utility function write_big_tiff for saving pyvips images. Documentation for the new mask builders has been added, and the tissue_mask function was refactored to handle default filters internally and correctly extract the image from the filter's return. A high-severity issue was identified in the write_big_tiff function where the resolution calculation for pyvips.tiffsave is incorrect, using 1000 / mpp (pixels/mm) instead of 10000 / mpp (pixels/cm) for xres and yres.
This pull request introduces a new modular mask builder system for assembling feature masks from neural network outputs, with flexible strategies for storage, aggregation, and preprocessing. The changes add a comprehensive documentation page, expose the new components in the package, and implement the core classes for aggregation and mask assembly. The most important changes are grouped below.
Documentation and navigation:
mask_builders.md) explaining mask builder concepts, usage examples, factory configurations, and guidance for selecting the right builder for different use cases.mkdocs.ymlnavigation to include the new Masks section and mask builder documentation.Core mask builder implementation:
MaskBuilderclass, which composes strategies for storage, aggregation, and preprocessing to assemble large masks from tiled data. This enables flexible configuration for memory management, overlap handling, and edge artifact removal.Aggregator,MeanAggregator,MaxAggregator) for handling overlapping tiles with mean or max strategies, including overlap counting for averaging.__init__.pyto support easy imports and factory composition.Configuration and compatibility:
.ruff.tomlto set the Python target version to 3.12 for compatibility with new features.Summary by CodeRabbit
New Features
Documentation
Tests
Chores