Skip to content

Mask builders#42

Open
matejpekar wants to merge 71 commits intomainfrom
mask-builders
Open

Mask builders#42
matejpekar wants to merge 71 commits intomainfrom
mask-builders

Conversation

@matejpekar
Copy link
Member

@matejpekar matejpekar commented Mar 13, 2026

This pull request introduces a new modular mask builder system for assembling feature masks from neural network outputs, with flexible strategies for storage, aggregation, and preprocessing. The changes add a comprehensive documentation page, expose the new components in the package, and implement the core classes for aggregation and mask assembly. The most important changes are grouped below.

Documentation and navigation:

  • Added a detailed reference page (mask_builders.md) explaining mask builder concepts, usage examples, factory configurations, and guidance for selecting the right builder for different use cases.
  • Updated mkdocs.yml navigation to include the new Masks section and mask builder documentation.

Core mask builder implementation:

  • Introduced the MaskBuilder class, which composes strategies for storage, aggregation, and preprocessing to assemble large masks from tiled data. This enables flexible configuration for memory management, overlap handling, and edge artifact removal.
  • Added abstract and concrete aggregation classes (Aggregator, MeanAggregator, MaxAggregator) for handling overlapping tiles with mean or max strategies, including overlap counting for averaging.
  • Exposed all mask builder components in the package’s __init__.py to support easy imports and factory composition.

Configuration and compatibility:

  • Updated .ruff.toml to set the Python target version to 3.12 for compatibility with new features.

Summary by CodeRabbit

  • New Features

    • Mask builders: assemble large masks from tiled data with configurable storage (in-memory or disk), aggregation strategies (mean/max) and preprocessors (edge clipping, autoscaling, scalar expansion); BigTIFF export for high‑resolution outputs.
  • Documentation

    • New Mask Builders reference page with usage examples, coordinate conventions, and edge/overlap guidance.
  • Tests

    • Extensive unit tests covering mask builders, preprocessors, storage/memmap behavior and deterministic test seeding.
  • Chores

    • Python 3.12 typing modernizations and .gitignore updates.

AdamBajger and others added 30 commits March 13, 2026 12:28
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: AdamBajger <33228382+AdamBajger@users.noreply.github.com>
Co-authored-by: AdamBajger <33228382+AdamBajger@users.noreply.github.com>
Co-authored-by: AdamBajger <33228382+AdamBajger@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: AdamBajger <33228382+AdamBajger@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
@matejpekar matejpekar requested review from a team, AdamBajger, Adames4 and vejtek March 13, 2026 15:54
@coderabbitai
Copy link

coderabbitai bot commented Mar 13, 2026

Warning

Rate limit exceeded

@matejpekar has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 3 minutes and 43 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c653a410-7464-4db0-8753-668b41e68491

📥 Commits

Reviewing files that changed from the base of the PR and between 594c398 and 3f9a3c4.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (15)
  • .ruff.toml
  • docs/reference/masks/mask_builders.md
  • mkdocs.yml
  • pyproject.toml
  • ratiopath/masks/__init__.py
  • ratiopath/masks/mask_builders/__init__.py
  • ratiopath/masks/mask_builders/aggregation.py
  • ratiopath/masks/mask_builders/mask_builder.py
  • ratiopath/masks/mask_builders/receptive_field_manipulation.py
  • ratiopath/masks/mask_builders/storage.py
  • ratiopath/masks/tissue_mask.py
  • ratiopath/masks/write_big_tiff.py
  • ratiopath/misc.py
  • ratiopath/parsers/darwin7_json_parser.py
  • tests/test_mask_builders.py
📝 Walkthrough

Walkthrough

Adds a composable MaskBuilder system with storage backends (inmemory, memmap), aggregators (mean, max), preprocessors (edge clipping, autoscaling, scalar expansion), docs/tests, a BigTIFF writer, typing modernizations, tooling/.gitignore tweaks, and a safely_instantiate helper.

Changes

Cohort / File(s) Summary
Mask Builder Core
ratiopath/masks/mask_builders/__init__.py, ratiopath/masks/mask_builders/mask_builder.py, ratiopath/masks/mask_builders/aggregation.py
Introduce MaskBuilder, Aggregator ABC, MeanAggregator, MaxAggregator, and slice-mapping utilities for tiled-to-global mask assembly; public API re-exports adjusted.
Preprocessors
ratiopath/masks/mask_builders/receptive_field_manipulation.py
Add Preprocessor base plus EdgeClippingPreprocessor, AutoScalingPreprocessor, and ScalarUniformExpansionPreprocessor with setup/process hooks.
Storage Backends
ratiopath/masks/mask_builders/storage.py
Add inmemory ndarray allocator and memmap memmap allocator with tempfile lifecycle, cleanup, and ownership semantics.
Documentation & Nav
docs/reference/masks/mask_builders.md, mkdocs.yml
Add detailed MaskBuilder documentation and register it under Reference → Masks in mkdocs navigation.
Tests & Fixtures
tests/conftest.py, tests/test_mask_builders.py
Add autouse RNG-seed fixture and extensive unit tests covering preprocessors, aggregators, memmap/tempfile behavior, and edge cases.
Utilities
ratiopath/misc.py
Add safely_instantiate helper to instantiate classes while filtering unsupported kwargs.
BigTIFF Output
ratiopath/masks/write_big_tiff.py
Add write_big_tiff to save pyvips Image as tiled BigTIFF with pyramid and resolution metadata.
Typing Modernization
ratiopath/masks/vips_filters/typed.py, ratiopath/model_selection/split.py, ratiopath/tiling/tilers.py
Migrate to built-in type aliases and update grid_tiles[*Dims] variadic signature (typing-only changes).
Tooling & Misc
.ruff.toml, .gitignore, ratiopath/parsers/geojson_parser.py, ratiopath/tiling/overlays.py
Bump ruff target to py312, add .vscode/ and site/ to .gitignore, small formatting/type-ignore tweaks.

Sequence Diagram(s)

sequenceDiagram
    participant User as User
    participant MB as MaskBuilder
    participant PRE as Preprocessor
    participant AGG as Aggregator
    participant ST as Storage

    User->>MB: __init__(shape, storage, aggregation, preprocessors)
    MB->>ST: allocate accumulator
    ST-->>MB: accumulator

    loop for each data batch
        User->>MB: update_batch(data_batch, coords_batch)
        MB->>PRE: process(data_batch, coords_batch)
        PRE-->>MB: data', coords'
        MB->>AGG: update(accumulator, data', coords')
        AGG->>ST: read/write accumulator slices
        ST-->>AGG: done
        AGG-->>MB: updated
    end

    User->>MB: finalize()
    MB->>AGG: finalize(accumulator)
    AGG-->>MB: final_mask
    MB-->>User: result
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Poem

🐰 I nibbled tiles and stitched them tight,

Memmap hummed softly through the night.
Clippers trimmed edges, scalers stretched long,
Aggregators sang—tiles joined the song. 🥕

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 55.56% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The pull request title 'Mask builders' is a generic, non-descriptive label that lacks specificity about the actual changes introduced. Consider a more descriptive title like 'Add modular mask builder system for assembling feature masks from tiled data' to better convey the scope and purpose of the changes.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch mask-builders
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a flexible and modular system for constructing feature masks from neural network predictions, particularly useful for tiled data like whole-slide images. This system streamlines the process of combining individual tile predictions into a cohesive, full-resolution mask by offering configurable strategies for memory management, overlap resolution, and data preprocessing. It aims to enhance the efficiency and adaptability of mask generation workflows.

Highlights

  • Modular Mask Builder System: Introduces a new modular mask builder system for assembling feature masks from neural network outputs, with flexible strategies for storage, aggregation, and preprocessing.
  • Comprehensive Documentation: Adds a comprehensive documentation page (mask_builders.md) explaining mask builder concepts, usage examples, factory configurations, and guidance for selecting the right builder for different use cases.
  • Core Implementation: Implements core classes including the MaskBuilder for orchestration, abstract and concrete Aggregator classes (MeanAggregator, MaxAggregator) for overlap handling, and Preprocessor classes (AutoScalingPreprocessor, EdgeClippingPreprocessor, ScalarUniformExpansionPreprocessor) for data transformations.
  • Package Exposure: Exposes all new mask builder components in the package's __init__.py to support easy imports and factory composition.
  • Python 3.12 Compatibility: Updates .ruff.toml to set the Python target version to 3.12 for compatibility with new features.
Changelog
  • .gitignore
    • Added 'site/' directory to ignore list.
  • .ruff.toml
    • Updated the Python target version from 'py311' to 'py312'.
  • docs/reference/masks/mask_builders.md
    • Added a new documentation page detailing the mask builder system, its components, and usage examples.
  • mkdocs.yml
    • Integrated the new 'Masks' section and 'mask_builders.md' into the documentation navigation.
  • ratiopath/masks/mask_builders/init.py
    • Created an __init__.py file to expose the new mask builder classes and functions for package-level import.
  • ratiopath/masks/mask_builders/aggregation.py
    • Introduced abstract Aggregator class and concrete implementations MeanAggregator and MaxAggregator for handling overlapping tile data, along with a helper function compute_acc_slices.
  • ratiopath/masks/mask_builders/mask_builder.py
    • Implemented the MaskBuilder class, which orchestrates storage, aggregation, and preprocessing strategies for mask assembly.
  • ratiopath/masks/mask_builders/receptive_field_manipulation.py
    • Defined an abstract Preprocessor class and its concrete implementations: EdgeClippingPreprocessor for removing tile edges, AutoScalingPreprocessor for adjusting coordinates based on resolution differences, and ScalarUniformExpansionPreprocessor for expanding scalar predictions to tile dimensions.
  • ratiopath/masks/mask_builders/storage.py
    • Created an abstract StorageAllocator class and its concrete implementations MemMapStorage (for disk-backed storage) and NumpyStorage (for in-memory storage).
  • ratiopath/masks/vips_filters/typed.py
    • Updated type alias syntax from TypeAlias = ... to type ... = for Res.
  • ratiopath/masks/write_big_tiff.py
    • Added a new function write_big_tiff for saving pyvips images as BigTIFF files with specific tiling and compression settings.
  • ratiopath/model_selection/split.py
    • Updated type alias syntax from TypeAlias = ... to type ... = for ArrayLike, MatrixLike, Int, and Float.
  • ratiopath/tiling/overlays.py
    • Added a type: ignore[return-value] comment to suppress a type checking warning in overlap_fraction.
  • ratiopath/tiling/tilers.py
    • Refactored the grid_tiles function signature to use *Dims directly instead of TypeVarTuple.
  • tests/conftest.py
    • Added a new pytest fixture reset_random_seed to ensure deterministic behavior for tests.
  • tests/test_mask_builders.py
    • Added a comprehensive suite of unit tests for the new mask builder system, covering scalar uniform averaging, max aggregation, edge clipping, autoscaling, and memmap file management.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an excellent, modular mask builder system using a strategy-based pattern for composition. The new components for storage, aggregation, and preprocessing are well-designed and accompanied by thorough documentation and tests.

My main feedback concerns resource management. The current implementation relies on __del__ for cleaning up temporary files in MemMapStorage, which can be unreliable. Specifically, the MeanAggregator can leak temporary files for its overlap counter. I've suggested adding an explicit cleanup() mechanism across the components and implementing the context manager protocol in MaskBuilder for deterministic resource deallocation. I also found a related weakness in the tests for file cleanup that should be addressed.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
ratiopath/tiling/overlays.py (1)

229-278: ⚠️ Potential issue | 🟡 Minor

Change the return type annotation from pa.MapArray to pa.StructArray and remove the type: ignore comment.

pyarrow.array() from Python dicts creates a StructArray (with fixed fields per row), not a MapArray. PyArrow's MapArray requires an explicit pa.map_(...) type and list-of-tuples input structure. The current type: ignore[return-value] masks a real type mismatch; update the annotation to pa.StructArray to reflect what the code actually returns.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ratiopath/tiling/overlays.py` around lines 229 - 278, Update the function
signature and return typing for tile_overlay_overlap: change the annotated
return type from pa.MapArray to pa.StructArray (i.e., def
tile_overlay_overlap(...) -> pa.StructArray) and remove the trailing " # type:
ignore[return-value]" on the final return; ensure any dependent references to
the function's return type reflect pa.StructArray. Keep the body
(overlap_fraction, overlap_vectorized, and pa.array(...) return) unchanged so
pyarrow-produced StructArray matches the annotation.
🧹 Nitpick comments (6)
ratiopath/masks/mask_builders/storage.py (2)

65-72: Flush the memmap before closing it.

NumPy documents flush() as the public way to persist memmap changes, and also notes there is no public API to close the underlying mmap. Skipping flush() here makes persistence depend on the private _mmap internals. (numpy.org)

Suggested change
     def cleanup(self) -> None:
         if hasattr(self, "accumulator"):
             try:
+                self.accumulator.flush()
                 self.accumulator._mmap.close()  # type: ignore[attr-defined]
             except Exception as e:
                 logger.warning(f"Failed to close memmap: {e}")
             finally:
                 del self.accumulator
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ratiopath/masks/mask_builders/storage.py` around lines 65 - 72, In cleanup(),
call the public flush() on the memmap object before touching its private _mmap:
if self.accumulator has a flush attribute, invoke self.accumulator.flush() to
persist changes, then proceed to close the underlying self.accumulator._mmap
inside the existing try/except/finally; ensure exceptions from flush are caught
and logged similarly to the current close error handling, and still delete
self.accumulator in the finally block.

1-14: Add a module docstring for this public API module.

This file introduces exported storage backends, so a short module-level description will help the generated API docs.

Suggested change
+"""Storage backends for mask-builder accumulators."""
+
 import logging

As per coding guidelines, "Add documentation for any new public functions, classes, or modules you create".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ratiopath/masks/mask_builders/storage.py` around lines 1 - 14, Add a concise
module-level docstring at the top of the module describing its purpose (exposed
storage backend abstractions and allocators) and intended public API for
documentation generation; place the docstring before imports and mention that
the module defines StorageAllocator (and any concrete storage backends) for
allocating numpy-backed storage, so readers of the generated docs understand the
module responsibility and exported symbols like StorageAllocator.
ratiopath/masks/write_big_tiff.py (1)

1-6: Add a module docstring for this new public module.

The function is documented, but the module itself is new and public. A short top-level docstring will make the generated API page much less bare.

Suggested change
+"""Utilities for exporting masks as tiled pyramidal BigTIFF files."""
+
 from pathlib import Path

As per coding guidelines, "Add documentation for any new public functions, classes, or modules you create".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ratiopath/masks/write_big_tiff.py` around lines 1 - 6, Add a short top-level
module docstring to ratiopath/masks/write_big_tiff.py that briefly describes the
module's purpose (writing large TIFFs using pyvips) and mentions the public API
(the write_big_tiff function); place it at the very top of the file as a one- or
two-sentence docstring so the generated docs are informative and follow the
project documentation guideline.
tests/conftest.py (1)

1-8: Consider using np.random.default_rng() instead of deprecated np.random.seed().

The legacy np.random.seed() and np.random.rand() APIs are discouraged in favor of the newer Generator API. While this works, the modern approach provides better reproducibility guarantees and thread safety.

♻️ Suggested refactor using modern NumPy RNG
 import numpy as np
 import pytest
 
 
 `@pytest.fixture`(autouse=True)
 def reset_random_seed():
     """Reset random seed before each test for deterministic behavior."""
-    np.random.seed(42)
+    # Seed the legacy global RNG for backwards compatibility with tests using np.random.rand()
+    np.random.seed(42)

Alternatively, if you update the tests to use the Generator API, you could inject the RNG:

`@pytest.fixture`
def rng():
    """Provide a seeded random generator for deterministic tests."""
    return np.random.default_rng(42)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/conftest.py` around lines 1 - 8, The reset_random_seed fixture uses the
legacy global RNG via np.random.seed(42); replace it with a modern
Generator-based fixture: remove or stop using the autouse reset_random_seed
fixture and add a fixture named rng that returns np.random.default_rng(42)
(function name: rng, symbol: np.random.default_rng) so tests can accept an rng
argument and use its methods (e.g., rng.random(), rng.integers()) instead of
global np.random.*; update tests to consume the new rng fixture where
deterministic randomness is needed.
ratiopath/masks/mask_builders/__init__.py (1)

1-32: Consider adding a module docstring for API documentation.

A brief module-level docstring would improve discoverability and help mkdocstrings generate better documentation for the package.

📝 Suggested addition
+"""Mask builder components for assembling feature masks from tiled data.
+
+This package provides a composable system for building large masks from
+neural network predictions or other tile-level data, with pluggable
+storage, aggregation, and preprocessing strategies.
+"""
+
 from ratiopath.masks.mask_builders.aggregation import (
     Aggregator,
     MaxAggregator,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ratiopath/masks/mask_builders/__init__.py` around lines 1 - 32, Add a concise
module-level docstring at the top of ratiopath/masks/mask_builders/__init__.py
that summarizes the package purpose (helpers to build and manage masks,
preprocessors, aggregators, and storage backends) and optionally lists the main
public symbols exported (e.g., Aggregator, MaxAggregator, MeanAggregator,
MaskBuilder, Preprocessor, AutoScalingPreprocessor, EdgeClippingPreprocessor,
ScalarUniformExpansionPreprocessor, MemMapStorage, NumpyStorage,
StorageAllocator) so mkdocstrings can generate clear API docs; place it above
the imports as the first statement and keep it short (1–3 sentences).
ratiopath/masks/mask_builders/aggregation.py (1)

52-104: Consider cleanup for overlap_storage in MeanAggregator.

When MeanAggregator is used with MemMapStorage, the overlap_storage creates its own memmap file. If the main storage is cleaned up but the aggregator isn't, the overlap file may not be properly deleted. Consider adding a cleanup method or ensuring the overlap storage is cleaned up when the aggregator is destroyed.

♻️ Suggested addition
 class MeanAggregator[DType: np.generic](Aggregator[DType]):
     """Aggregator that implements averaging aggregation for overlapping tiles."""
 
+    def cleanup(self) -> None:
+        """Clean up the overlap storage resources."""
+        if hasattr(self, "overlap_storage"):
+            self.overlap_storage.cleanup()
+
     def __init__(
         self,

Then in MaskBuilder, you could call self.aggregator.cleanup() if the method exists.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ratiopath/masks/mask_builders/aggregation.py` around lines 52 - 104,
MeanAggregator currently creates a secondary storage object
(self.overlap_storage) that may open its own file (e.g., with MemMapStorage) and
never gets cleaned up; add a cleanup method on MeanAggregator that calls the
underlying storage's close/delete method (e.g., call
self.overlap_storage.close() or self.overlap_storage.cleanup() depending on
StorageAllocator API) and ensure any file handles are released and files removed
when appropriate, and update MaskBuilder to call self.aggregator.cleanup()
(guarded by hasattr) when the main storage is cleaned up or the builder is
destroyed so the overlap memmap file is removed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/reference/masks/mask_builders.md`:
- Around line 135-151: The example using AutoScalingPreprocessor and MaskBuilder
calls np.array(...) but omits the numpy import; add "import numpy as np" at the
top of the snippet so np is defined before AutoScalingPreprocessor and
MaskBuilder are used (fix the example block that constructs auto_scale with
source_extents/source_tile_extents/source_tile_strides using np.array).

In `@ratiopath/masks/mask_builders/receptive_field_manipulation.py`:
- Around line 108-116: get_vips_scale_factors currently returns
tuple(scale_factors) which yields tuple[Any, ...] and can break the annotated
return type tuple[float, float]; change the return to explicitly produce a tuple
of two floats by converting each element to float (e.g., build tuple(float(x)
for x in scale_factors) or otherwise cast both elements to float) so the
returned value matches the annotated signature for get_vips_scale_factors.

In `@ratiopath/masks/write_big_tiff.py`:
- Around line 6-13: Validate input parameters at the start of write_big_tiff:
check that mpp_x and mpp_y are > 0 and that tile_width and tile_height are
positive integers; if any check fails, raise a ValueError with a clear message
identifying the bad parameter(s). Add the same upfront guards to the other
exported function(s) handling these parameters (lines ~38-39) so callers get an
immediate, descriptive error instead of an internal division/usage failure.

---

Outside diff comments:
In `@ratiopath/tiling/overlays.py`:
- Around line 229-278: Update the function signature and return typing for
tile_overlay_overlap: change the annotated return type from pa.MapArray to
pa.StructArray (i.e., def tile_overlay_overlap(...) -> pa.StructArray) and
remove the trailing " # type: ignore[return-value]" on the final return; ensure
any dependent references to the function's return type reflect pa.StructArray.
Keep the body (overlap_fraction, overlap_vectorized, and pa.array(...) return)
unchanged so pyarrow-produced StructArray matches the annotation.

---

Nitpick comments:
In `@ratiopath/masks/mask_builders/__init__.py`:
- Around line 1-32: Add a concise module-level docstring at the top of
ratiopath/masks/mask_builders/__init__.py that summarizes the package purpose
(helpers to build and manage masks, preprocessors, aggregators, and storage
backends) and optionally lists the main public symbols exported (e.g.,
Aggregator, MaxAggregator, MeanAggregator, MaskBuilder, Preprocessor,
AutoScalingPreprocessor, EdgeClippingPreprocessor,
ScalarUniformExpansionPreprocessor, MemMapStorage, NumpyStorage,
StorageAllocator) so mkdocstrings can generate clear API docs; place it above
the imports as the first statement and keep it short (1–3 sentences).

In `@ratiopath/masks/mask_builders/aggregation.py`:
- Around line 52-104: MeanAggregator currently creates a secondary storage
object (self.overlap_storage) that may open its own file (e.g., with
MemMapStorage) and never gets cleaned up; add a cleanup method on MeanAggregator
that calls the underlying storage's close/delete method (e.g., call
self.overlap_storage.close() or self.overlap_storage.cleanup() depending on
StorageAllocator API) and ensure any file handles are released and files removed
when appropriate, and update MaskBuilder to call self.aggregator.cleanup()
(guarded by hasattr) when the main storage is cleaned up or the builder is
destroyed so the overlap memmap file is removed.

In `@ratiopath/masks/mask_builders/storage.py`:
- Around line 65-72: In cleanup(), call the public flush() on the memmap object
before touching its private _mmap: if self.accumulator has a flush attribute,
invoke self.accumulator.flush() to persist changes, then proceed to close the
underlying self.accumulator._mmap inside the existing try/except/finally; ensure
exceptions from flush are caught and logged similarly to the current close error
handling, and still delete self.accumulator in the finally block.
- Around line 1-14: Add a concise module-level docstring at the top of the
module describing its purpose (exposed storage backend abstractions and
allocators) and intended public API for documentation generation; place the
docstring before imports and mention that the module defines StorageAllocator
(and any concrete storage backends) for allocating numpy-backed storage, so
readers of the generated docs understand the module responsibility and exported
symbols like StorageAllocator.

In `@ratiopath/masks/write_big_tiff.py`:
- Around line 1-6: Add a short top-level module docstring to
ratiopath/masks/write_big_tiff.py that briefly describes the module's purpose
(writing large TIFFs using pyvips) and mentions the public API (the
write_big_tiff function); place it at the very top of the file as a one- or
two-sentence docstring so the generated docs are informative and follow the
project documentation guideline.

In `@tests/conftest.py`:
- Around line 1-8: The reset_random_seed fixture uses the legacy global RNG via
np.random.seed(42); replace it with a modern Generator-based fixture: remove or
stop using the autouse reset_random_seed fixture and add a fixture named rng
that returns np.random.default_rng(42) (function name: rng, symbol:
np.random.default_rng) so tests can accept an rng argument and use its methods
(e.g., rng.random(), rng.integers()) instead of global np.random.*; update tests
to consume the new rng fixture where deterministic randomness is needed.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 55d45655-b538-4fee-8b95-bff94d167ec5

📥 Commits

Reviewing files that changed from the base of the PR and between 9abb3fc and c46e1de.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (17)
  • .gitignore
  • .ruff.toml
  • docs/reference/masks/mask_builders.md
  • mkdocs.yml
  • ratiopath/masks/mask_builders/__init__.py
  • ratiopath/masks/mask_builders/aggregation.py
  • ratiopath/masks/mask_builders/mask_builder.py
  • ratiopath/masks/mask_builders/receptive_field_manipulation.py
  • ratiopath/masks/mask_builders/storage.py
  • ratiopath/masks/vips_filters/typed.py
  • ratiopath/masks/write_big_tiff.py
  • ratiopath/model_selection/split.py
  • ratiopath/parsers/geojson_parser.py
  • ratiopath/tiling/overlays.py
  • ratiopath/tiling/tilers.py
  • tests/conftest.py
  • tests/test_mask_builders.py

@matejpekar
Copy link
Member Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a well-designed and flexible mask builder system using a composition-over-inheritance approach, which is excellent. The new components are well-documented and accompanied by a comprehensive test suite. My feedback focuses on improving adherence to Python's standard naming conventions for better readability, making resource cleanup more deterministic, and correcting a minor issue in the documentation examples.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

🧹 Nitpick comments (1)
ratiopath/masks/mask_builders/aggregation.py (1)

95-95: Consider widening overlap counter dtype for high-overlap workloads.

Using np.uint16 can overflow above 65,535 overlaps per pixel, which would silently skew mean normalization.

🔧 Optional hardening
-            dtype=np.uint16,
+            dtype=np.uint32,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ratiopath/masks/mask_builders/aggregation.py` at line 95, The overlap counter
currently uses dtype=np.uint16 which can overflow for >65,535 overlaps per
pixel; update the allocation for the overlap counter in
ratiopath/masks/mask_builders/aggregation.py to use a wider integer type (e.g.,
np.uint32 or np.uint64) instead of np.uint16 so mean normalization cannot be
silently corrupted; change the dtype argument where the overlap/count array is
created (the array currently constructed with dtype=np.uint16) to a wider dtype
and, optionally, make the dtype configurable if high-overlap workloads are
expected.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.ruff.toml:
- Line 40: Remove "N801" from the global ignore list in .ruff.toml and instead
add targeted inline suppressions for the intentionally lowercase classes
inmemory and memmap by appending "# noqa: N801" to their class definitions in
ratiopath/masks/mask_builders/storage.py; also add a short comment next to each
noqa explaining the NumPy-style naming intent so the exceptions are explicit and
localized.

In `@docs/reference/masks/mask_builders.md`:
- Line 113: The example passes an unsupported argument to
EdgeClippingPreprocessor; remove the invalid num_dims=2 from the constructor
call (change clip_prep = EdgeClippingPreprocessor(px_to_clip=4, num_dims=2) to
clip_prep = EdgeClippingPreprocessor(px_to_clip=4)) so the example uses the
actual EdgeClippingPreprocessor signature; update any surrounding text if you
intended to convey dimensionality using a different API or parameter.

In `@ratiopath/masks/mask_builders/aggregation.py`:
- Around line 99-115: The update method uses slices from compute_acc_slices and
directly indexes into accumulator and self.overlap_counter which allows negative
or out-of-range coords to silently wrap or overflow; before applying
accumulator[:, *acc_slices] and self.overlap_counter[:, *acc_slices] validate
that each slice start >= 0 and end <= accumulator.shape for the corresponding
dimensions (or clamp to valid bounds) and raise a clear exception if
coords_batch would write outside the valid extent; perform the same bounds
checks/clamping for the other update path referenced (the second update method
around lines 139-152) and reference compute_acc_slices, accumulator,
overlap_counter, coords_batch, and data_batch when adding the checks so the
guard rails are applied in both update methods.
- Around line 123-125: The cleanup method currently only deletes
self.overlap_counter; if overlap storage is memmap-backed you must explicitly
flush/close it before dropping the reference. Update cleanup (method cleanup) to
check for hasattr(self, "overlap_counter") and if that object exposes memmap
semantics (e.g. is instance of numpy.memmap or has .flush()/.close() methods),
call the appropriate cleanup (call .flush() then .close() or .mmap.close() as
applicable) and only then delete self.overlap_counter to ensure the underlying
memory-mapped file is released.

In `@ratiopath/masks/mask_builders/mask_builder.py`:
- Around line 117-121: The cleanup() method should explicitly invoke backend
cleanup/close methods instead of just deleting the attribute; update cleanup()
to, when self has a storage attribute, call any available teardown method on it
(e.g., storage.cleanup() or storage.close() or storage.release()) via getattr
and only then del self.storage, and ensure self.aggregator.cleanup() is called
after the storage backend has been properly torn down; reference the cleanup()
method, the storage attribute, and self.aggregator.cleanup to locate where to
change the sequencing.
- Around line 73-75: The code assigns any ndarray to self.storage in the
MaskBuilder class without verifying it's compatible with the shape expected
after setup(), which can cause downstream indexing errors; update the assignment
in mask_builders/mask_builder.py (where self.storage is set) to validate the
provided ndarray's ndim and shape against the MaskBuilder's expected storage
shape (the shape computed/used by setup() or the internal shape properties), and
if it doesn't match raise a clear ValueError; ensure validation references the
class/methods (MaskBuilder, setup, self.storage) so callers supplying external
storage get immediate, descriptive feedback instead of later slice/index errors.

In `@ratiopath/masks/mask_builders/receptive_field_manipulation.py`:
- Around line 151-155: The computation of total_strides can become negative when
source_extents is smaller than self.source_tile_extents; update the calculation
around total_strides so you clamp the stride counts to zero before/after
applying ceil to avoid negative values that shrink
overflow_buffered_source_extents and mask_extents. Concretely, compute the
difference (source_extents - self.source_tile_extents) / source_tile_strides,
ensure negative results are set to 0 (e.g., via np.maximum(..., 0)) and then
apply np.ceil and astype(np.int64) to produce non-negative total_strides used
downstream by overflow_buffered_source_extents and mask_extents.
- Around line 57-89: Before generating slices in process, validate
self.clip_start_indices and self.clip_end_indices against the input extents:
ensure all clip_start_indices and clip_end_indices are non-negative integers,
that clip_start_indices[i] + clip_end_indices[i] <= extents[i] for every
dimension, and that resulting slice ranges produce start < end; if any check
fails raise ValueError with a clear message. Implement these guards at the start
of process (before computing slices/adjusted_coords_batch) using extents =
np.asarray(data_batch.shape[2:], dtype=np.int64) and the existing
self.clip_start_indices/self.clip_end_indices arrays so you prevent negative or
over‑clipping for data_batch/coords_batch.

In `@ratiopath/masks/mask_builders/storage.py`:
- Around line 27-35: The docs refer to the constructor argument as filepath but
the public API actually accepts filename; update the docstring text in
storage.py to consistently use the parameter name filename (and any example
usages) and adjust the description about persistence and FileExistsError to
reference filename so copy-pasted code matches the constructor signature (ensure
references to FileExistsError remain accurate for the filename-backed case).

In `@ratiopath/misc.py`:
- Around line 6-19: Add a Google-style docstring to the public helper function
safely_instantiate explaining its purpose (instantiate a callable/class with
provided args/kwargs), describe the kwargs filtering behavior (if the target
accepts **kwargs it forwards all kwargs; otherwise it filters kwargs to only
parameters present in inspect.signature), document the return value and types
(returns an instance of T, cast from Callable[..., T]), mention that positional
args are passed through as-is and that this helper does not modify args, and
include brief examples or notes on edge cases (e.g., how it handles unexpected
kwargs and the use of inspect.Parameter.VAR_KEYWORD) to satisfy public API
documentation guidelines.

---

Nitpick comments:
In `@ratiopath/masks/mask_builders/aggregation.py`:
- Line 95: The overlap counter currently uses dtype=np.uint16 which can overflow
for >65,535 overlaps per pixel; update the allocation for the overlap counter in
ratiopath/masks/mask_builders/aggregation.py to use a wider integer type (e.g.,
np.uint32 or np.uint64) instead of np.uint16 so mean normalization cannot be
silently corrupted; change the dtype argument where the overlap/count array is
created (the array currently constructed with dtype=np.uint16) to a wider dtype
and, optionally, make the dtype configurable if high-overlap workloads are
expected.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4715eb12-6316-472e-9d8e-af4c3aa0edbd

📥 Commits

Reviewing files that changed from the base of the PR and between c46e1de and 3b81f0b.

📒 Files selected for processing (9)
  • .ruff.toml
  • docs/reference/masks/mask_builders.md
  • ratiopath/masks/mask_builders/__init__.py
  • ratiopath/masks/mask_builders/aggregation.py
  • ratiopath/masks/mask_builders/mask_builder.py
  • ratiopath/masks/mask_builders/receptive_field_manipulation.py
  • ratiopath/masks/mask_builders/storage.py
  • ratiopath/misc.py
  • tests/test_mask_builders.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • ratiopath/masks/mask_builders/init.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/test_mask_builders.py`:
- Around line 351-390: The test test_numpy_memmap_persistent_file currently
never calls MaskBuilder.finalize(), so it verifies persistence only after object
deletion; update the test to explicitly call assembler.finalize() (on the
MaskBuilder instance named assembler) before deleting the object (del assembler)
and then assert that filename and the overlaps file exist; ensure cleanup still
runs after those assertions. This will make the test validate persistence upon
finalization rather than relying solely on garbage-collection/deletion behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e5be22d1-1d8f-4204-bbf5-458f30ad032a

📥 Commits

Reviewing files that changed from the base of the PR and between 6e85a13 and 594c398.

📒 Files selected for processing (1)
  • tests/test_mask_builders.py

@matejpekar
Copy link
Member Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new MaskBuilder module for assembling large masks from tiled data, supporting various storage (in-memory, memory-mapped), aggregation (mean, max), and transformation strategies (edge clipping, auto-scaling, scalar expansion). It also updates the project version to 1.3.0, migrates type aliases to Python 3.12 syntax, and adds a new utility function write_big_tiff for saving pyvips images. Documentation for the new mask builders has been added, and the tissue_mask function was refactored to handle default filters internally and correctly extract the image from the filter's return. A high-severity issue was identified in the write_big_tiff function where the resolution calculation for pyvips.tiffsave is incorrect, using 1000 / mpp (pixels/mm) instead of 10000 / mpp (pixels/cm) for xres and yres.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants