Skip to content

feat: add evaluation callbacks and metrics#7

Closed
LAdam-ix wants to merge 1 commit intomainfrom
feature/ml-evaluation
Closed

feat: add evaluation callbacks and metrics#7
LAdam-ix wants to merge 1 commit intomainfrom
feature/ml-evaluation

Conversation

@LAdam-ix
Copy link
Collaborator

@LAdam-ix LAdam-ix commented Mar 11, 2026

Callbacks

  • TilesExport: Saves individual test/predict tiles as PNG files. Used mostly in early stages for getting some samples for visual inspection. Now demo.py (in diffrent PR) can also be used.
  • AnalysisExport: Runs StainAnalyzer (data PR) on test batches, computes all metrics
    comparing modified vs original and predicted vs original tiles. Saves results
    as CSV and logs summary statistics to MLflow.
  • WSIAssembler: Assembles predicted tiles back into full whole-slide images
    as pyramid TIFFs. Handles tile overlap using running average blending. Buffers
    one slide at a time using memmap, becase of space concerns.

Metrics

SSIM (Structure similarity index)

Image metrics (metrics/image_metrics.py):

  • NMI (Normalized Median Intensity): Ratio of median intensity to 95th percentile.
    Measures relative brightness of the image.
  • PCC (Pearson Correlation Coefficient): Pixel-level linear correlation between
    two images. High PCC means the overall structure is preserved.
  • LAB Brightness PSNR: Peak signal-to-noise ratio on the L* (lightness) channel
    in Lab space. Measures how well brightness is preserved after normalization.

Stain vector distance (metrics/vector_metrics.py):

  • Uses CIE76 Delta E to compare estimated stain vectors (hematoxylin, eosin).
  • Stain vectors are converted from optical density to Lab color space.
  • Comparison uses only chromaticity (a*, b*) with L*=0, so brightness differences
    do not affect the result — the metric focuses purely on dye/stain differences.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added stain analysis capabilities for quantifying image quality and consistency across multiple metrics
    • Added analysis export feature to save comparison metrics to CSV files
    • Added tile export functionality for saving individual prediction tiles during evaluation
    • Added whole-slide image reconstruction to assemble predicted tiles back into complete slide imagery
  • Chores

    • Added project configuration and build system setup

@LAdam-ix LAdam-ix requested a review from matejpekar March 11, 2026 08:51
@LAdam-ix LAdam-ix self-assigned this Mar 11, 2026
@LAdam-ix LAdam-ix requested review from a team and JakubPekar March 11, 2026 08:51
@coderabbitai
Copy link

coderabbitai bot commented Mar 11, 2026

📝 Walkthrough

Walkthrough

This PR establishes a complete Python project structure for stain normalization, introducing configuration files (.gitignore, pyproject.toml), an analysis module for comparing images against references, a metrics module for computing similarity measures, and a callbacks module for integrating analysis and prediction workflows with PyTorch Lightning training pipelines.

Changes

Cohort / File(s) Summary
Project Configuration
.gitignore, pyproject.toml
Added comprehensive Python project setup including Git ignore rules for common build artifacts, caches, and IDEs, plus project metadata, dependencies, dev tools (mypy, ruff, pre-commit), and CLI script registrations for mask generation, tiling, training, validation, testing, and prediction.
Analysis Module
stain_normalization/analysis/__init__.py, stain_normalization/analysis/analyzer.py
Introduced StainAnalyzer class to compare images against a reference using multiple metrics (stain vectors, SSIM, Pearson correlation, normalized mutual information, Lab brightness PSNR), with result accumulation, statistics computation, and CSV export capabilities.
Metrics Module
stain_normalization/metrics/__init__.py, stain_normalization/metrics/image_metrics.py, stain_normalization/metrics/vector_metrics.py
Added utility functions for computing image similarity metrics (NMI, Pearson correlation coefficient, Lab brightness PSNR) and vector comparison (chromaticity-based Delta E, hematoxylin/eosin distance), with public API re-exports.
Callbacks Module
stain_normalization/callbacks/__init__.py, stain_normalization/callbacks/_base.py, stain_normalization/callbacks/analysis_export.py, stain_normalization/callbacks/tiles_export.py, stain_normalization/callbacks/wsi_assembler.py
Introduced NormalizationCallback base class for tensor denormalization, AnalysisExport callback for metric logging during testing, TilesExport callback for saving per-slide image tiles, and WSIAssembler callback for reconstructing whole-slide pyramid TIFFs from predicted tiles using memory-mapped buffers and tile blending.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🐰 Hops of Code and Stain
Metrics bloom where samples dwell,
Vectors dance, tiles blend so well,
Whole slides rise from scattered pieces,
Normalization's magic increases—
A rabbit's gift: science complete! 🎨✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 62.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'feat: add evaluation callbacks and metrics' directly and accurately describes the main changes: it adds three evaluation callbacks (TilesExport, AnalysisExport, WSIAssembler) and multiple new metrics (NMI, PCC, LAB Brightness PSNR, stain vector distance) across the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/ml-evaluation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the evaluation capabilities of the stain normalization project by introducing a suite of new callbacks and metrics. These additions allow for comprehensive assessment of model performance, including visual inspection of individual tiles, quantitative analysis of image quality and stain consistency, and reconstruction of full whole-slide images from predicted tiles.

Highlights

  • New Evaluation Callbacks: Three new callbacks were introduced: TilesExport for saving individual tiles, AnalysisExport for computing and logging metrics, and WSIAssembler for reconstructing whole-slide images.
  • Enhanced Image Metrics: New image quality metrics were added, including SSIM, Normalized Median Intensity (NMI), Pearson Correlation Coefficient (PCC), and LAB Brightness PSNR.
  • Stain Vector Comparison: Functionality to compare stain vectors using CIE76 Delta E was implemented, focusing purely on chromaticity differences.
  • MLflow Integration: Analysis results can now be saved to CSV files and logged as artifacts to MLflow for better experiment tracking.
  • Whole-Slide Image Reconstruction: The WSIAssembler callback enables reassembly of predicted tiles into full pyramid TIFFs, efficiently handling tile overlap with running average blending and memory mapping.
Changelog
  • .gitignore
    • Added standard Python and project-specific ignore rules to prevent unnecessary files from being committed.
  • pyproject.toml
    • Configured project metadata, specified Python version, and listed core dependencies.
    • Defined development dependencies for static analysis and code formatting.
    • Added PDM scripts for common development tasks like mask generation, tiling, training, validation, testing, prediction, linting, and formatting.
  • stain_normalization/analysis/init.py
    • Initialized the analysis module by exposing the StainAnalyzer class.
  • stain_normalization/analysis/analyzer.py
    • Implemented the StainAnalyzer class to compare images using a configurable set of metrics (vectors, SSIM, PCC, NMI, LAB PSNR).
    • Provided methods to accumulate comparison results, generate summary statistics, and save results to CSV files.
  • stain_normalization/callbacks/init.py
    • Initialized the callbacks module and exposed the new AnalysisExport, TilesExport, and WSIAssembler callbacks, along with the base NormalizationCallback.
  • stain_normalization/callbacks/_base.py
    • Defined NormalizationCallback as a base class for other callbacks, providing utility methods for denormalizing tensors and converting them to uint8 HWC NumPy arrays.
  • stain_normalization/callbacks/analysis_export.py
    • Implemented the AnalysisExport callback to compute and accumulate image comparison metrics (modified vs. original, predicted vs. original) during testing.
    • Added functionality to save collected metrics as CSV files and log them as MLflow artifacts at the end of testing.
  • stain_normalization/callbacks/tiles_export.py
    • Implemented the TilesExport callback to save individual original, modified, and predicted image tiles as PNG files during test and predict phases.
  • stain_normalization/callbacks/wsi_assembler.py
    • Implemented the WSIAssembler callback to reconstruct whole-slide images from predicted tiles.
    • Managed memory-mapped buffers for efficient processing of large slides and applied running average blending for overlapping tiles.
    • Saved the assembled whole-slide images as pyramid TIFFs with embedded sRGB ICC profiles.
  • stain_normalization/metrics/init.py
    • Initialized the metrics module, exposing functions for image and vector metrics.
  • stain_normalization/metrics/image_metrics.py
    • Implemented compute_nmi for Normalized Median Intensity, measuring relative brightness.
    • Implemented compute_pcc for Pearson Correlation Coefficient, assessing pixel-level linear correlation.
    • Implemented compute_lab_brightness_psnr for Peak Signal-to-Noise Ratio on the L* channel in Lab color space.
  • stain_normalization/metrics/vector_metrics.py
    • Implemented _od_to_lab to convert optical density vectors to Lab color space.
    • Implemented delta_e76 to calculate CIE76 Delta E (chromaticity only) between two Lab colors.
    • Implemented compare_vectors to compare two sets of stain vectors, returning distances for hematoxylin and eosin, and indicating if vectors were swapped.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🧹 Nitpick comments (2)
stain_normalization/callbacks/tiles_export.py (1)

50-67: Minor: Missing # type: ignore[override] comment for consistency.

on_test_batch_end has the # type: ignore[override] comment for narrowed Lightning types, but on_predict_batch_end doesn't. Consider adding it for consistency.

✨ Suggested fix
-    def on_predict_batch_end(
+    def on_predict_batch_end(  # type: ignore[override]  # narrowed Lightning STEP_OUTPUT
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stain_normalization/callbacks/tiles_export.py` around lines 50 - 67, Add the
missing type-ignore for narrowed Lightning types on the on_predict_batch_end
override: update the on_predict_batch_end function signature (the def
on_predict_batch_end(...) in tiles_export.py) to include the comment "# type:
ignore[override]" at the end of the signature line to match on_test_batch_end
and silence the mypy/typing override warning.
stain_normalization/callbacks/_base.py (1)

12-15: Consider defensive validation for normalization config.

The constructor assumes normalization_config.mean and normalization_config.std exist and are 3-element sequences. If misconfigured, the .view(3, 1, 1) call will fail with a cryptic error.

🛡️ Optional: Add validation for clearer error messages
 def __init__(self, normalization_config: DictConfig) -> None:
     super().__init__()
+    if len(normalization_config.mean) != 3 or len(normalization_config.std) != 3:
+        raise ValueError("normalization_config.mean and .std must have exactly 3 elements")
     self.mean = torch.tensor(normalization_config.mean).view(3, 1, 1)
     self.std = torch.tensor(normalization_config.std).view(3, 1, 1)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stain_normalization/callbacks/_base.py` around lines 12 - 15, The __init__
currently assumes normalization_config.mean and .std exist and are length-3
sequences causing cryptic failures when invalid; add defensive validation in
__init__ to (1) verify normalization_config has attributes 'mean' and 'std', (2)
check each is a sequence of exactly 3 numeric values (or torch-compatible), and
(3) raise a clear ValueError describing which key is missing or malformed (e.g.,
"normalization_config.mean must be a 3-element numeric sequence") before calling
torch.tensor(...). Use the existing attribute names (self.mean, self.std,
normalization_config) to locate and apply this validation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pyproject.toml`:
- Around line 15-20: The git-based dependencies rationai-mlkit, rationai-masks,
rationai-tiling, and rationai-staining are pinned to moving heads; update each
dependency string in pyproject.toml to include an immutable ref by appending
@<tag-or-commit> (prefer a release tag or full commit SHA) to each URL so they
become e.g. "rationai-mlkit @ git+https://.../mlkit.git@<tag-or-commit>" and
similarly for rationai-masks, rationai-tiling, and rationai-staining to ensure
reproducible installs.
- Around line 8-21: The project is missing direct runtime deps for modules
imported in the codebase: add pandas (used in
stain_normalization/analysis/analyzer.py), omegaconf (used in the various
callback files), and pyvips (used in
stain_normalization/callbacks/wsi_assembler.py) to the dependencies list in
pyproject.toml with appropriate version constraints (e.g., pandas>=1.x,
omegaconf>=2.x, pyvips>=2.x) so a clean install will satisfy imports at runtime.
- Around line 34-37: The mypy task currently runs "mypy ." which lets mypy
discover modules via multiple paths for the namespace package and causes
duplicate-module errors; update the mypy_check entry (symbol: mypy_check) in
pyproject.toml to explicitly target the package directory (e.g. run "mypy
stain_normalization" instead of "mypy .") so mypy only checks the intended
package and avoids namespace package resolution collisions.

In `@stain_normalization/analysis/analyzer.py`:
- Around line 91-97: The bug is that paired metrics are determined from the raw
parameter reference instead of the resolved ref_img, so fixed-instance
comparisons drop paired metrics; in the compare flow change the boolean check to
use the resolved ref_img (set is_paired = ref_img is not None) instead of
reference is not None, and make the identical change in the second block
mentioned (the code around the similar logic at lines ~125-146) so both places
use self._ref_img/final ref_img to decide paired metrics.
- Around line 41-46: The constructor currently treats any falsy metrics value
(e.g., an empty list) as "all metrics"; change the assignment in __init__ so
only None maps to AVAILABLE_METRICS and an explicit empty list stays empty —
i.e., set self.metrics = self.AVAILABLE_METRICS if metrics is None else metrics;
update references to the metrics property if they rely on truthiness (use
explicit emptiness checks) to avoid triggering expensive work when metrics == []
in Analyzer.
- Around line 110-123: The code currently asserts ref_vectors and calls
estimate_stain_vectors/compare_vectors without handling failures, which can
raise exceptions; replace the assert and the direct calls in the block guarded
by "vectors" in self.metrics with explicit checks: if ref_vectors is None,
populate the result keys ref_vec_0...ref_vec_N and img_vec_0...img_vec_N (or at
least the expected number of vector slots used elsewhere) with NaN and set any
vector-related flags (e.g., "was_swapped") to False/NaN, then return/skip vector
comparison; likewise, catch failures from estimate_stain_vectors (image vectors)
and if it returns None or raises, set img_vec_* to NaN and skip calling
compare_vectors; only call compare_vectors when both ref_vectors and img_vectors
are valid, and when vec_result exists, use vec_result to fill flags and reorder
img_vectors into img_vectors_paired before writing img_vec_* into result.

In `@stain_normalization/callbacks/wsi_assembler.py`:
- Around line 174-195: The code fails for out-of-bounds tiles when x >= ex or y
>= ey because ex - x or ey - y become negative and slicing produces empty
destination slices; fix by clamping/validating the destination size before
slicing: compute h = max(0, min(tile.shape[0], ey - y)) and w = max(0,
min(tile.shape[1], ex - x)) (or early-return/continue if h == 0 or w == 0), then
use these positive h,w for tile = tile[:h, :w], region/count slices and only
perform the blending and sb.result_buffer/sb.count_buffer updates when h > 0 and
w > 0 (refer to symbols tile, x, y, ex, ey, h, w, sb.result_buffer,
sb.count_buffer in wsi_assembler.py).
- Around line 130-137: The current try/except in the callback swallows errors
from _save_slide(self._active_name, self._active); change it to catch Exception
as e, perform the necessary cleanup in the finally block (delete
self._active.result_buffer and self._active.count_buffer and cleanup temp_dir as
already done), and then re-raise the exception (raise) so failed slide writes
propagate. Keep the call to _save_slide and the finally cleanup intact, but
replace the bare except with "except Exception as e:" followed by logging if
desired and then "raise" to ensure failures aren't silently ignored.

In `@stain_normalization/metrics/image_metrics.py`:
- Around line 58-61: The code currently always divides img1/img2 by 255.0 before
calling rgb2lab (variables lab1, lab2 in the function using rgb2lab and
peak_signal_noise_ratio), which corrupts already-normalized float inputs; change
to conditionally scale only when needed by checking dtype or value range (e.g.,
if img.dtype is uint8 or img.max() > 1.0 then divide by 255.0), apply the same
check to both img1 and img2 before calling rgb2lab, and then compute PSNR on the
L channel as before using peak_signal_noise_ratio.

---

Nitpick comments:
In `@stain_normalization/callbacks/_base.py`:
- Around line 12-15: The __init__ currently assumes normalization_config.mean
and .std exist and are length-3 sequences causing cryptic failures when invalid;
add defensive validation in __init__ to (1) verify normalization_config has
attributes 'mean' and 'std', (2) check each is a sequence of exactly 3 numeric
values (or torch-compatible), and (3) raise a clear ValueError describing which
key is missing or malformed (e.g., "normalization_config.mean must be a
3-element numeric sequence") before calling torch.tensor(...). Use the existing
attribute names (self.mean, self.std, normalization_config) to locate and apply
this validation.

In `@stain_normalization/callbacks/tiles_export.py`:
- Around line 50-67: Add the missing type-ignore for narrowed Lightning types on
the on_predict_batch_end override: update the on_predict_batch_end function
signature (the def on_predict_batch_end(...) in tiles_export.py) to include the
comment "# type: ignore[override]" at the end of the signature line to match
on_test_batch_end and silence the mypy/typing override warning.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3c20e5c9-352b-442a-8967-e76e6bd801fb

📥 Commits

Reviewing files that changed from the base of the PR and between 0ce77d2 and f4b25c2.

⛔ Files ignored due to path filters (2)
  • pdm.lock is excluded by !**/*.lock
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (12)
  • .gitignore
  • pyproject.toml
  • stain_normalization/analysis/__init__.py
  • stain_normalization/analysis/analyzer.py
  • stain_normalization/callbacks/__init__.py
  • stain_normalization/callbacks/_base.py
  • stain_normalization/callbacks/analysis_export.py
  • stain_normalization/callbacks/tiles_export.py
  • stain_normalization/callbacks/wsi_assembler.py
  • stain_normalization/metrics/__init__.py
  • stain_normalization/metrics/image_metrics.py
  • stain_normalization/metrics/vector_metrics.py

Comment on lines +8 to +21
dependencies = [
"lightning>=2.0.0",
"albumentations>=1.4.14",
"torchmetrics>=1.4.14",
"torchvision>=0.17.2",
"torch>=2.2.2",
"tqdm>=4.66.5",
"rationai-mlkit @ git+https://gitlab.ics.muni.cz/rationai/digital-pathology/libraries/mlkit.git",
"rationai-masks @ git+https://gitlab.ics.muni.cz/rationai/digital-pathology/libraries/masks.git",
"rationai-tiling @ git+https://gitlab.ics.muni.cz/rationai/digital-pathology/libraries/tiling.git",
"scikit-image>=0.25.2",
"openslide-bin>=4.0.0.6",
"rationai-staining @ git+https://gitlab.ics.muni.cz/rationai/digital-pathology/libraries/staining.git",
]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's see the overall repository structure
find . -type f -name "*.py" | head -20

Repository: RationAI/stain-normalization

Length of output: 527


🏁 Script executed:

# Check if the files mentioned in the review exist
ls -la stain_normalization/analysis/analyzer.py 2>&1
ls -la stain_normalization/callbacks/wsi_assembler.py 2>&1

Repository: RationAI/stain-normalization

Length of output: 257


🏁 Script executed:

# Get the complete pyproject.toml to see current dependencies
cat -n pyproject.toml

Repository: RationAI/stain-normalization

Length of output: 2023


🏁 Script executed:

# Check imports in analyzer.py
echo "=== analyzer.py imports ==="
head -30 stain_normalization/analysis/analyzer.py

Repository: RationAI/stain-normalization

Length of output: 919


🏁 Script executed:

# Check imports in wsi_assembler.py
echo "=== wsi_assembler.py imports ==="
head -50 stain_normalization/callbacks/wsi_assembler.py

Repository: RationAI/stain-normalization

Length of output: 2005


🏁 Script executed:

# Search for pandas, omegaconf, pyvips imports across the entire codebase
echo "=== Searching for pandas imports ==="
rg "import pandas|from pandas" --type py

echo "=== Searching for omegaconf imports ==="
rg "import omegaconf|from omegaconf" --type py

echo "=== Searching for pyvips imports ==="
rg "import pyvips|from pyvips" --type py

Repository: RationAI/stain-normalization

Length of output: 628


Add missing direct runtime dependencies.

The codebase imports pandas (in stain_normalization/analysis/analyzer.py), omegaconf (in multiple callback files), and pyvips (in stain_normalization/callbacks/wsi_assembler.py), but none are declared in dependencies. A clean install will fail at import time.

Proposed fix
 dependencies = [
     "lightning>=2.0.0",
     "albumentations>=1.4.14",
     "torchmetrics>=1.4.14",
     "torchvision>=0.17.2",
     "torch>=2.2.2",
     "tqdm>=4.66.5",
+    "pandas>=2.2.0",
+    "omegaconf>=2.3.0",
+    "pyvips>=2.2.3",
     "rationai-mlkit @ git+https://gitlab.ics.muni.cz/rationai/digital-pathology/libraries/mlkit.git",
     "rationai-masks @ git+https://gitlab.ics.muni.cz/rationai/digital-pathology/libraries/masks.git",
     "rationai-tiling @ git+https://gitlab.ics.muni.cz/rationai/digital-pathology/libraries/tiling.git",
     "scikit-image>=0.25.2",
     "openslide-bin>=4.0.0.6",
     "rationai-staining @ git+https://gitlab.ics.muni.cz/rationai/digital-pathology/libraries/staining.git",
 ]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
dependencies = [
"lightning>=2.0.0",
"albumentations>=1.4.14",
"torchmetrics>=1.4.14",
"torchvision>=0.17.2",
"torch>=2.2.2",
"tqdm>=4.66.5",
"rationai-mlkit @ git+https://gitlab.ics.muni.cz/rationai/digital-pathology/libraries/mlkit.git",
"rationai-masks @ git+https://gitlab.ics.muni.cz/rationai/digital-pathology/libraries/masks.git",
"rationai-tiling @ git+https://gitlab.ics.muni.cz/rationai/digital-pathology/libraries/tiling.git",
"scikit-image>=0.25.2",
"openslide-bin>=4.0.0.6",
"rationai-staining @ git+https://gitlab.ics.muni.cz/rationai/digital-pathology/libraries/staining.git",
]
dependencies = [
"lightning>=2.0.0",
"albumentations>=1.4.14",
"torchmetrics>=1.4.14",
"torchvision>=0.17.2",
"torch>=2.2.2",
"tqdm>=4.66.5",
"pandas>=2.2.0",
"omegaconf>=2.3.0",
"pyvips>=2.2.3",
"rationai-mlkit @ git+https://gitlab.ics.muni.cz/rationai/digital-pathology/libraries/mlkit.git",
"rationai-masks @ git+https://gitlab.ics.muni.cz/rationai/digital-pathology/libraries/masks.git",
"rationai-tiling @ git+https://gitlab.ics.muni.cz/rationai/digital-pathology/libraries/tiling.git",
"scikit-image>=0.25.2",
"openslide-bin>=4.0.0.6",
"rationai-staining @ git+https://gitlab.ics.muni.cz/rationai/digital-pathology/libraries/staining.git",
]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pyproject.toml` around lines 8 - 21, The project is missing direct runtime
deps for modules imported in the codebase: add pandas (used in
stain_normalization/analysis/analyzer.py), omegaconf (used in the various
callback files), and pyvips (used in
stain_normalization/callbacks/wsi_assembler.py) to the dependencies list in
pyproject.toml with appropriate version constraints (e.g., pandas>=1.x,
omegaconf>=2.x, pyvips>=2.x) so a clean install will satisfy imports at runtime.

Comment on lines +15 to +20
"rationai-mlkit @ git+https://gitlab.ics.muni.cz/rationai/digital-pathology/libraries/mlkit.git",
"rationai-masks @ git+https://gitlab.ics.muni.cz/rationai/digital-pathology/libraries/masks.git",
"rationai-tiling @ git+https://gitlab.ics.muni.cz/rationai/digital-pathology/libraries/tiling.git",
"scikit-image>=0.25.2",
"openslide-bin>=4.0.0.6",
"rationai-staining @ git+https://gitlab.ics.muni.cz/rationai/digital-pathology/libraries/staining.git",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n pyproject.toml | head -40

Repository: RationAI/stain-normalization

Length of output: 1767


Pin the Git dependencies to immutable refs.

Lines 15-17 and 20 contain Git dependencies that resolve to moving repository heads. Without pinned commits or tags, the same project version can install different code over time, breaking reproducibility and weakening supply-chain traceability—critical concerns for training and evaluation pipelines.

Add @<tag-or-commit> to each URL:

  • rationai-mlkit @ git+https://gitlab.ics.muni.cz/rationai/digital-pathology/libraries/mlkit.git@<tag-or-commit>
  • rationai-masks @ git+https://gitlab.ics.muni.cz/rationai/digital-pathology/libraries/masks.git@<tag-or-commit>
  • rationai-tiling @ git+https://gitlab.ics.muni.cz/rationai/digital-pathology/libraries/tiling.git@<tag-or-commit>
  • rationai-staining @ git+https://gitlab.ics.muni.cz/rationai/digital-pathology/libraries/staining.git@<tag-or-commit>
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pyproject.toml` around lines 15 - 20, The git-based dependencies
rationai-mlkit, rationai-masks, rationai-tiling, and rationai-staining are
pinned to moving heads; update each dependency string in pyproject.toml to
include an immutable ref by appending @<tag-or-commit> (prefer a release tag or
full commit SHA) to each URL so they become e.g. "rationai-mlkit @
git+https://.../mlkit.git@<tag-or-commit>" and similarly for rationai-masks,
rationai-tiling, and rationai-staining to ensure reproducible installs.

Comment on lines +34 to +37
l = { composite = ["lint", "format", "mypy_check"] }
lint = "ruff check --fix"
format = "ruff format"
mypy_check = "mypy ."
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Check repository structure
git ls-files | head -20

Repository: RationAI/stain-normalization

Length of output: 602


🏁 Script executed:

# Look at pyproject.toml to understand current configuration
cat pyproject.toml

Repository: RationAI/stain-normalization

Length of output: 1673


🏁 Script executed:

# Search for analysis module locations
fd "analysis" -t d

Repository: RationAI/stain-normalization

Length of output: 101


🏁 Script executed:

# Search for analyzer.py files
fd "analyzer" -t f

Repository: RationAI/stain-normalization

Length of output: 112


🏁 Script executed:

# Check for any analysis directory or module at root or other locations
find . -type d -name "analysis" 2>/dev/null

Repository: RationAI/stain-normalization

Length of output: 102


🏁 Script executed:

# Check for any __init__.py files that might define packages differently
find . -type f -name "__init__.py" 2>/dev/null | grep -v __pycache__

Repository: RationAI/stain-normalization

Length of output: 200


🏁 Script executed:

# Check if there's any analyzer module outside of stain_normalization
find . -type f -name "analyzer.py" 2>/dev/null

Repository: RationAI/stain-normalization

Length of output: 114


🏁 Script executed:

# Check preprocessing directory structure
ls -la preprocessing/ 2>/dev/null || echo "No preprocessing directory"

Repository: RationAI/stain-normalization

Length of output: 98


🏁 Script executed:

# Look at the stain_normalization __init__.py to see package structure
cat stain_normalization/__init__.py

Repository: RationAI/stain-normalization

Length of output: 135


🏁 Script executed:

# Check if there are any __init__.py files at root level
ls -la *.py 2>/dev/null || echo "No Python files at root"

Repository: RationAI/stain-normalization

Length of output: 95


🏁 Script executed:

# Check the complete directory structure
tree -L 2 -I '__pycache__|*.egg-info' || find . -maxdepth 2 -type d | grep -v __pycache__

Repository: RationAI/stain-normalization

Length of output: 256


🏁 Script executed:

# Check if stain_normalization is a namespace package by examining its structure
ls -la stain_normalization/

Repository: RationAI/stain-normalization

Length of output: 351


🏁 Script executed:

# Look at stain_normalization/analysis/__init__.py
cat stain_normalization/analysis/__init__.py

Repository: RationAI/stain-normalization

Length of output: 172


🏁 Script executed:

# Check if there's a setup.py or setup.cfg
ls -la setup.py setup.cfg 2>/dev/null || echo "No setup files found"

Repository: RationAI/stain-normalization

Length of output: 92


🏁 Script executed:

# Check the CI workflow to see what mypy errors are actually reported
cat .github/workflows/python-lint.yml

Repository: RationAI/stain-normalization

Length of output: 284


🏁 Script executed:

# Check for mypy configuration files
find . -type f \( -name "mypy.ini" -o -name ".mypy.ini" -o -name "setup.cfg" \) 2>/dev/null

Repository: RationAI/stain-normalization

Length of output: 54


🏁 Script executed:

# Check if mypy configuration is in pyproject.toml
grep -A 10 "\[tool.mypy\]" pyproject.toml || echo "No mypy config in pyproject.toml"

Repository: RationAI/stain-normalization

Length of output: 104


🏁 Script executed:

# Check ruff.toml for any relevant configuration
cat ruff.toml

Repository: RationAI/stain-normalization

Length of output: 988


🏁 Script executed:

# Verify if stain_normalization is truly a namespace package by checking for __init__.py
ls -la stain_normalization/ | grep __init__

Repository: RationAI/stain-normalization

Length of output: 54


🏁 Script executed:

# Check the actual imports in analyzer.py to understand usage
cat stain_normalization/analysis/analyzer.py | head -30

Repository: RationAI/stain-normalization

Length of output: 891


🏁 Script executed:

# Verify the package structure more explicitly
python3 << 'EOF'
import sys
import os

# Check if stain_normalization has __init__.py
stain_norm_init = os.path.exists("stain_normalization/__init__.py")
analysis_init = os.path.exists("stain_normalization/analysis/__init__.py")

print(f"stain_normalization/__init__.py exists: {stain_norm_init}")
print(f"stain_normalization/analysis/__init__.py exists: {analysis_init}")

# This is a namespace package pattern if stain_normalization lacks __init__.py
if not stain_norm_init and analysis_init:
    print("\nThis is a NAMESPACE PACKAGE structure")
    print("stain_normalization is a namespace package")
    print("Running 'mypy .' on namespace packages can cause module discovery issues")
EOF

Repository: RationAI/stain-normalization

Length of output: 326


🏁 Script executed:

# Check if there are any other places where analysis module might be imported from
grep -r "from analysis" . --include="*.py" 2>/dev/null || echo "No direct 'from analysis' imports found"

Repository: RationAI/stain-normalization

Length of output: 111


🏁 Script executed:

# Check imports to understand the actual usage pattern
grep -r "import.*analyzer\|from.*analyzer" . --include="*.py" 2>/dev/null

Repository: RationAI/stain-normalization

Length of output: 293


Fix the mypy configuration for namespace package compatibility.

The package structure is a namespace package (missing stain_normalization/__init__.py), which causes mypy to discover the analysis.analyzer module through multiple resolution paths when running mypy .. This leads to module duplication errors. Update the mypy command to explicitly target the package:

Recommended fix
 [tool.pdm.scripts]
 ...
-mypy_check = "mypy ."
+mypy_check = "mypy --explicit-package-bases stain_normalization"
+
+[tool.mypy]
+explicit_package_bases = true
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pyproject.toml` around lines 34 - 37, The mypy task currently runs "mypy ."
which lets mypy discover modules via multiple paths for the namespace package
and causes duplicate-module errors; update the mypy_check entry (symbol:
mypy_check) in pyproject.toml to explicitly target the package directory (e.g.
run "mypy stain_normalization" instead of "mypy .") so mypy only checks the
intended package and avoids namespace package resolution collisions.

Comment on lines +41 to +46
def __init__(
self,
reference: np.ndarray[Any, Any] | None = None,
metrics: list[str] | None = None,
) -> None:
self.metrics = metrics or self.AVAILABLE_METRICS
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Don't turn metrics=[] into “all metrics”.

The docstring says None = all, but Line 46 also treats an empty list as “all”. That makes it impossible to disable all metrics explicitly and can trigger expensive work the caller did not ask for.

Proposed fix
-        self.metrics = metrics or self.AVAILABLE_METRICS
+        self.metrics = list(self.AVAILABLE_METRICS if metrics is None else metrics)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stain_normalization/analysis/analyzer.py` around lines 41 - 46, The
constructor currently treats any falsy metrics value (e.g., an empty list) as
"all metrics"; change the assignment in __init__ so only None maps to
AVAILABLE_METRICS and an explicit empty list stays empty — i.e., set
self.metrics = self.AVAILABLE_METRICS if metrics is None else metrics; update
references to the metrics property if they rely on truthiness (use explicit
emptiness checks) to avoid triggering expensive work when metrics == [] in
Analyzer.

Comment on lines +91 to +97
ref_img = reference if reference is not None else self._ref_img
if ref_img is None:
raise ValueError(
"No reference image. Pass one to __init__ or to compare()."
)

is_paired = reference is not None
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use the resolved reference for paired metrics too.

With StainAnalyzer(reference=ref, metrics=["ssim", "pcc", "lab_psnr"]), compare(image) currently returns none of those metrics because is_paired is tied only to the per-call reference. That makes the fixed-reference API silently drop three metrics.

Proposed fix
-        is_paired = reference is not None
+        is_paired = ref_img is not None

Also applies to: 125-146

🧰 Tools
🪛 Ruff (0.15.5)

[warning] 93-95: Avoid specifying long messages outside the exception class

(TRY003)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stain_normalization/analysis/analyzer.py` around lines 91 - 97, The bug is
that paired metrics are determined from the raw parameter reference instead of
the resolved ref_img, so fixed-instance comparisons drop paired metrics; in the
compare flow change the boolean check to use the resolved ref_img (set is_paired
= ref_img is not None) instead of reference is not None, and make the identical
change in the second block mentioned (the code around the similar logic at lines
~125-146) so both places use self._ref_img/final ref_img to decide paired
metrics.

Comment on lines +110 to +123
if "vectors" in self.metrics:
assert (
ref_vectors is not None
) # refrence image deosnt have stain vectors (too much background)
img_vectors = estimate_stain_vectors(image)
vec_result = compare_vectors(ref_vectors, img_vectors)
result.update(vec_result)
img_vectors_paired = (
img_vectors[[1, 0]] if vec_result["was_swapped"] else img_vectors
)
for j, val in enumerate(ref_vectors.flatten()):
result[f"ref_vec_{j}"] = float(val)
for j, val in enumerate(img_vectors_paired.flatten()):
result[f"img_vec_{j}"] = float(val)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's look at the file structure and the specific code section
head -150 stain_normalization/analysis/analyzer.py | tail -50

Repository: RationAI/stain-normalization

Length of output: 1966


🏁 Script executed:

# Let's also check the full file to understand imports and estimate_stain_vectors
cat -n stain_normalization/analysis/analyzer.py | grep -A 5 -B 5 "estimate_stain_vectors"

Repository: RationAI/stain-normalization

Length of output: 1903


🏁 Script executed:

# Find the estimate_stain_vectors function definition
fd -t f '*.py' | xargs grep -l "def estimate_stain_vectors"

Repository: RationAI/stain-normalization

Length of output: 357


🏁 Script executed:

# Let's also search for compare_vectors
fd -t f '*.py' | xargs grep -l "def compare_vectors"

Repository: RationAI/stain-normalization

Length of output: 357


🏁 Script executed:

# Search for compare_vectors definition
rg "def compare_vectors" --type py

Repository: RationAI/stain-normalization

Length of output: 138


🏁 Script executed:

# Check if there are any try-except blocks around estimate_stain_vectors calls
rg "estimate_stain_vectors" -A 3 -B 3 --type py

Repository: RationAI/stain-normalization

Length of output: 2904


🏁 Script executed:

# Look for any error handling or exception patterns in the analyzer.py file
rg "except|try|raise" stain_normalization/analysis/analyzer.py -n

Repository: RationAI/stain-normalization

Length of output: 141


🏁 Script executed:

# Let's read the full context around the vector analysis section to see all error paths
sed -n '95,130p' stain_normalization/analysis/analyzer.py

Repository: RationAI/stain-normalization

Length of output: 1468


🏁 Script executed:

# Read the compare_vectors function
cat -n stain_normalization/metrics/vector_metrics.py | grep -A 30 "def compare_vectors"

Repository: RationAI/stain-normalization

Length of output: 1362


🏁 Script executed:

# Check if compare_vectors or related functions have any None handling
rg "compare_vectors" -A 20 stain_normalization/metrics/vector_metrics.py

Repository: RationAI/stain-normalization

Length of output: 740


🏁 Script executed:

# Look for any test files or documentation about estimate_stain_vectors behavior
fd -type f -name "*test*.py" | xargs rg "estimate_stain_vectors" -B 2 -A 5

Repository: RationAI/stain-normalization

Length of output: 3546


🏁 Script executed:

# Check what returns None or could raise errors in the vector analysis flow
rg "was_swapped|flatten" stain_normalization/metrics/vector_metrics.py -B 3 -A 3

Repository: RationAI/stain-normalization

Length of output: 1299


🏁 Script executed:

# Check if compare_vectors handles None inputs
cat -n stain_normalization/metrics/vector_metrics.py | head -80

Repository: RationAI/stain-normalization

Length of output: 2915


🏁 Script executed:

# Verify the exact error path: what happens at line 118 if img_vectors is None
rg "img_vectors\[\[" stain_normalization/analysis/analyzer.py -B 5 -A 5

Repository: RationAI/stain-normalization

Length of output: 674


Handle stain-vector extraction failure explicitly.

Vector estimation can fail on background-heavy tiles, currently causing analysis to abort with AssertionError, TypeError, or AttributeError instead of recording NaN metrics. The assert ref_vectors is not None at line 111 fails if the reference fails; if the image vector estimation fails at line 114, the None value passed to compare_vectors (or accessed at line 118) raises TypeError.

Proposed fix
         if "vectors" in self.metrics:
-            assert (
-                ref_vectors is not None
-            )  # refrence image deosnt have stain vectors (too much background)
-            img_vectors = estimate_stain_vectors(image)
-            vec_result = compare_vectors(ref_vectors, img_vectors)
-            result.update(vec_result)
-            img_vectors_paired = (
-                img_vectors[[1, 0]] if vec_result["was_swapped"] else img_vectors
-            )
-            for j, val in enumerate(ref_vectors.flatten()):
-                result[f"ref_vec_{j}"] = float(val)
-            for j, val in enumerate(img_vectors_paired.flatten()):
-                result[f"img_vec_{j}"] = float(val)
+            img_vectors = estimate_stain_vectors(image)
+            if ref_vectors is None or img_vectors is None:
+                result.update(
+                    {
+                        "d_hematoxylin": float("nan"),
+                        "d_eosin": float("nan"),
+                        "was_swapped": False,
+                    }
+                )
+            else:
+                vec_result = compare_vectors(ref_vectors, img_vectors)
+                result.update(vec_result)
+                img_vectors_paired = (
+                    img_vectors[[1, 0]] if vec_result["was_swapped"] else img_vectors
+                )
+                for j, val in enumerate(ref_vectors.flatten()):
+                    result[f"ref_vec_{j}"] = float(val)
+                for j, val in enumerate(img_vectors_paired.flatten()):
+                    result[f"img_vec_{j}"] = float(val)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if "vectors" in self.metrics:
assert (
ref_vectors is not None
) # refrence image deosnt have stain vectors (too much background)
img_vectors = estimate_stain_vectors(image)
vec_result = compare_vectors(ref_vectors, img_vectors)
result.update(vec_result)
img_vectors_paired = (
img_vectors[[1, 0]] if vec_result["was_swapped"] else img_vectors
)
for j, val in enumerate(ref_vectors.flatten()):
result[f"ref_vec_{j}"] = float(val)
for j, val in enumerate(img_vectors_paired.flatten()):
result[f"img_vec_{j}"] = float(val)
if "vectors" in self.metrics:
img_vectors = estimate_stain_vectors(image)
if ref_vectors is None or img_vectors is None:
result.update(
{
"d_hematoxylin": float("nan"),
"d_eosin": float("nan"),
"was_swapped": False,
}
)
else:
vec_result = compare_vectors(ref_vectors, img_vectors)
result.update(vec_result)
img_vectors_paired = (
img_vectors[[1, 0]] if vec_result["was_swapped"] else img_vectors
)
for j, val in enumerate(ref_vectors.flatten()):
result[f"ref_vec_{j}"] = float(val)
for j, val in enumerate(img_vectors_paired.flatten()):
result[f"img_vec_{j}"] = float(val)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stain_normalization/analysis/analyzer.py` around lines 110 - 123, The code
currently asserts ref_vectors and calls estimate_stain_vectors/compare_vectors
without handling failures, which can raise exceptions; replace the assert and
the direct calls in the block guarded by "vectors" in self.metrics with explicit
checks: if ref_vectors is None, populate the result keys ref_vec_0...ref_vec_N
and img_vec_0...img_vec_N (or at least the expected number of vector slots used
elsewhere) with NaN and set any vector-related flags (e.g., "was_swapped") to
False/NaN, then return/skip vector comparison; likewise, catch failures from
estimate_stain_vectors (image vectors) and if it returns None or raises, set
img_vec_* to NaN and skip calling compare_vectors; only call compare_vectors
when both ref_vectors and img_vectors are valid, and when vec_result exists, use
vec_result to fill flags and reorder img_vectors into img_vectors_paired before
writing img_vec_* into result.

Comment on lines +130 to +137
try:
self._save_slide(self._active_name, self._active)
except Exception:
traceback.print_exc()
finally:
del self._active.result_buffer
del self._active.count_buffer
self._active.temp_dir.cleanup()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't swallow failed slide writes.

If _save_slide fails, the callback just prints a traceback and continues. That can silently drop whole-slide outputs while the predict run still looks successful.

Proposed fix
         assert self._active_name is not None
+        error: Exception | None = None
         try:
             self._save_slide(self._active_name, self._active)
-        except Exception:
-            traceback.print_exc()
+        except Exception as exc:
+            error = exc
         finally:
             del self._active.result_buffer
             del self._active.count_buffer
             self._active.temp_dir.cleanup()
             self._active = None
             self._active_name = None
+        if error is not None:
+            raise error
🧰 Tools
🪛 Ruff (0.15.5)

[warning] 132-132: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stain_normalization/callbacks/wsi_assembler.py` around lines 130 - 137, The
current try/except in the callback swallows errors from
_save_slide(self._active_name, self._active); change it to catch Exception as e,
perform the necessary cleanup in the finally block (delete
self._active.result_buffer and self._active.count_buffer and cleanup temp_dir as
already done), and then re-raise the exception (raise) so failed slide writes
propagate. Keep the call to _save_slide and the finally cleanup intact, but
replace the bare except with "except Exception as e:" followed by logging if
desired and then "raise" to ensure failures aren't silently ignored.

Comment on lines +174 to +195
h, w = min(tile.shape[0], ey - y), min(tile.shape[1], ex - x)
tile = tile[:h, :w]

region = sb.result_buffer[y : y + h, x : x + w]
count = sb.count_buffer[y : y + h, x : x + w]

# Running average: avg = (old * n + new) / (n + 1)
overlap = count > 0
if overlap.any():
n = count[:, :, np.newaxis].astype(np.float32)
blended = np.where(
overlap[:, :, np.newaxis],
(region.astype(np.float32) * n + tile) / (n + 1),
tile,
)
sb.result_buffer[y : y + h, x : x + w] = np.clip(blended, 0, 255).astype(
np.uint8
)
else:
sb.result_buffer[y : y + h, x : x + w] = tile

sb.count_buffer[y : y + h, x : x + w] = count + 1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Clamp or reject out-of-bounds tiles before slicing.

When x >= extent_x or y >= extent_y, ex - x / ey - y go negative. tile[:h, :w] then keeps most of the source tile while the destination slice is empty, which will eventually mis-shape the assignment or corrupt edge handling.

Proposed fix
         sb = self._active
         ex, ey = sb.meta.extent_x, sb.meta.extent_y

-        h, w = min(tile.shape[0], ey - y), min(tile.shape[1], ex - x)
-        tile = tile[:h, :w]
+        if x >= ex or y >= ey or x + tile.shape[1] <= 0 or y + tile.shape[0] <= 0:
+            return
+
+        dst_x = max(x, 0)
+        dst_y = max(y, 0)
+        src_x = max(-x, 0)
+        src_y = max(-y, 0)
+        w = min(tile.shape[1] - src_x, ex - dst_x)
+        h = min(tile.shape[0] - src_y, ey - dst_y)
+        tile = tile[src_y : src_y + h, src_x : src_x + w]

-        region = sb.result_buffer[y : y + h, x : x + w]
-        count = sb.count_buffer[y : y + h, x : x + w]
+        region = sb.result_buffer[dst_y : dst_y + h, dst_x : dst_x + w]
+        count = sb.count_buffer[dst_y : dst_y + h, dst_x : dst_x + w]
...
-            sb.result_buffer[y : y + h, x : x + w] = np.clip(blended, 0, 255).astype(
+            sb.result_buffer[dst_y : dst_y + h, dst_x : dst_x + w] = np.clip(blended, 0, 255).astype(
                 np.uint8
             )
         else:
-            sb.result_buffer[y : y + h, x : x + w] = tile
+            sb.result_buffer[dst_y : dst_y + h, dst_x : dst_x + w] = tile

-        sb.count_buffer[y : y + h, x : x + w] = count + 1
+        sb.count_buffer[dst_y : dst_y + h, dst_x : dst_x + w] = count + 1
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stain_normalization/callbacks/wsi_assembler.py` around lines 174 - 195, The
code fails for out-of-bounds tiles when x >= ex or y >= ey because ex - x or ey
- y become negative and slicing produces empty destination slices; fix by
clamping/validating the destination size before slicing: compute h = max(0,
min(tile.shape[0], ey - y)) and w = max(0, min(tile.shape[1], ex - x)) (or
early-return/continue if h == 0 or w == 0), then use these positive h,w for tile
= tile[:h, :w], region/count slices and only perform the blending and
sb.result_buffer/sb.count_buffer updates when h > 0 and w > 0 (refer to symbols
tile, x, y, ex, ey, h, w, sb.result_buffer, sb.count_buffer in
wsi_assembler.py).

Comment on lines +58 to +61
lab1 = rgb2lab(img1.astype(np.float32) / 255.0)
lab2 = rgb2lab(img2.astype(np.float32) / 255.0)
return float(
peak_signal_noise_ratio(lab1[:, :, 0], lab2[:, :, 0], data_range=100.0)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, examine the actual file content around the specified lines
cat -n stain_normalization/metrics/image_metrics.py | sed -n '50,70p'

Repository: RationAI/stain-normalization

Length of output: 511


🏁 Script executed:

# Check the full function signature and context
cat -n stain_normalization/metrics/image_metrics.py | sed -n '45,80p'

Repository: RationAI/stain-normalization

Length of output: 701


🏁 Script executed:

# Check what rgb2lab expects from scikit-image documentation and find usages of this function
rg "compute_lab_brightness_psnr" --type py -B 2 -A 5

Repository: RationAI/stain-normalization

Length of output: 2525


🏁 Script executed:

# Check imports to understand the rgb2lab source
head -30 stain_normalization/metrics/image_metrics.py

Repository: RationAI/stain-normalization

Length of output: 788


🏁 Script executed:

# Check for any tests related to this function
fd -e py -path "*/test*" | xargs rg "compute_lab_brightness_psnr" 2>/dev/null

Repository: RationAI/stain-normalization

Length of output: 686


🏁 Script executed:

# Check the context in analyzer.py where the function is called
cat -n stain_normalization/analysis/analyzer.py | sed -n '1,50p'

Repository: RationAI/stain-normalization

Length of output: 1777


🏁 Script executed:

# Look for where ref_img and image are loaded/processed before being passed to compute_lab_brightness_psnr
rg "lab_brightness_psnr" -B 10 -A 2 stain_normalization/analysis/analyzer.py

Repository: RationAI/stain-normalization

Length of output: 878


🏁 Script executed:

# Check for any tests
find . -name "*.py" -type f | xargs grep -l "compute_lab_brightness_psnr" 2>/dev/null | grep -i test

Repository: RationAI/stain-normalization

Length of output: 54


🏁 Script executed:

# Look for example usage or demo files
find . -name "*.py" -type f | xargs grep -l "compute_lab_brightness_psnr" | head -5

Repository: RationAI/stain-normalization

Length of output: 203


🏁 Script executed:

# Check the complete analyzer context around the call
cat -n stain_normalization/analysis/analyzer.py | sed -n '1,150p'

Repository: RationAI/stain-normalization

Length of output: 6000


🏁 Script executed:

# Look for how images are loaded and used throughout the codebase
rg "StainAnalyzer\|\.compare\(" -B 3 -A 3 | head -50

Repository: RationAI/stain-normalization

Length of output: 54


🏁 Script executed:

# Search for example usage or tests
find . -name "*.py" -type f -exec grep -l "StainAnalyzer" {} \;

Repository: RationAI/stain-normalization

Length of output: 208


🏁 Script executed:

# Look at the compute_pcc function for context on expected image format
cat -n stain_normalization/metrics/image_metrics.py | sed -n '25,45p'

Repository: RationAI/stain-normalization

Length of output: 748


🏁 Script executed:

# Check if there's documentation on expected input format
cat stain_normalization/metrics/image_metrics.py

Repository: RationAI/stain-normalization

Length of output: 1667


🌐 Web query:

scikit-image rgb2lab input range expectations float uint8

💡 Result:

skimage.color.rgb2lab follows scikit-image’s dtype-based “expected range” conventions:

  • If your RGB image is uint8 (typical from io.imread): values are expected in [0, 255], and when the conversion needs floats it will rescale to float in [0, 1] internally. [3]
  • If your RGB image is floating-point (float32/float64): values are expected to already be in [0, 1]. scikit-image generally treats float images as being in [0, 1], and float inputs are not automatically rescaled. [2]
    • In particular, skimage.util.img_as_float() does not rescale floating-point inputs (so a float image in 0–255 stays 0–255). [1]

Practical rule:

  • Passing uint8 0–255 to rgb2lab is fine.
  • Passing float 0–255 is not fine—normalize first (e.g., rgb_float = rgb_uint8 / 255.0). [1][2][3]

Sources: [1] https://scikit-image.org/docs/0.22.x/user_guide/data_types.html, [2] https://scikit-image.org/docs/0.20.x/user_guide/transforming_image_data.html, [3] https://scikit-image.org/docs/stable/skips/4-transition-to-v2.html


🏁 Script executed:

# Check how images are typically loaded and used in the codebase
find . -name "*.py" -type f -exec grep -l "imread\|cv2\|PIL\|Image.open" {} \;

Repository: RationAI/stain-normalization

Length of output: 168


🏁 Script executed:

# Look for any README or example usage
find . -name "README*" -o -name "*.md" | head -5

Repository: RationAI/stain-normalization

Length of output: 83


🏁 Script executed:

# Check the actual behavior: what does rgb2lab do with uint8 vs float inputs?
python3 << 'EOF'
import numpy as np
# Simulate the current behavior
uint8_img = np.array([[[255, 0, 0]]], dtype=np.uint8)
float_img = np.array([[[1.0, 0.0, 0.0]]], dtype=np.float32)

print("uint8 image (255,0,0):")
print(f"  astype(float32) / 255.0 = {uint8_img.astype(np.float32) / 255.0}")

print("\nfloat image (1.0,0.0,0.0):")
print(f"  astype(float32) / 255.0 = {float_img.astype(np.float32) / 255.0}")
print("  This makes it almost black!")

print("\nProposed fix check:")
print(f"  np.max(float_img) <= 1.0: {np.max(float_img) <= 1.0}")
print(f"  np.max(uint8_img) <= 1.0: {np.max(uint8_img) <= 1.0}")
EOF

Repository: RationAI/stain-normalization

Length of output: 189


🏁 Script executed:

# Verify rgb2lab expects [0,1] for float inputs
python3 << 'EOF'
try:
    from skimage.color import rgb2lab
    import numpy as np
    
    # Test with properly scaled float [0,1]
    proper_float = np.array([[[1.0, 0.0, 0.0]]], dtype=np.float32)
    lab_proper = rgb2lab(proper_float)
    print(f"Proper float [0,1] → Lab: {lab_proper}")
    
    # Test with incorrectly scaled float (divided by 255)
    incorrect_float = np.array([[[1.0/255.0, 0.0, 0.0]]], dtype=np.float32)
    lab_incorrect = rgb2lab(incorrect_float)
    print(f"Incorrect float [0, 1/255] → Lab: {lab_incorrect}")
    print("\nDifference shows the problem!")
except Exception as e:
    print(f"Error: {e}")
EOF

Repository: RationAI/stain-normalization

Length of output: 104


🏁 Script executed:

# Check the README to understand expected usage
cat README.md | head -100

Repository: RationAI/stain-normalization

Length of output: 93


🏁 Script executed:

# Look at the tiles_export and wsi_assembler to understand image formats
grep -n "imread\|cv2\|PIL" stain_normalization/callbacks/tiles_export.py stain_normalization/callbacks/wsi_assembler.py | head -20

Repository: RationAI/stain-normalization

Length of output: 524


🏁 Script executed:

# Check if there are any integration tests or examples
find . -name "*.py" -type f -exec grep -l "compare\|StainAnalyzer" {} \; | xargs grep -l "uint8\|float\|dtype"

Repository: RationAI/stain-normalization

Length of output: 213


🏁 Script executed:

# Look at what type of images come from typical image loading
python3 << 'EOF'
import numpy as np
# Standard image loading behavior:
# cv2.imread returns uint8 [0, 255]
# PIL Image.open converts to uint8 [0, 255]  
# scikit-image io.imread returns uint8 [0, 255] by default

# The risk scenario: if someone passes already-normalized float [0,1]
# Current code: img.astype(np.float32) / 255.0
# - uint8 [0,255] → float32 [0, 255] → float32 [0, 1.0] ✓ CORRECT
# - float32 [0,1] → float32 [0, 1] → float32 [0, 0.00392] ✗ WRONG

print("Scenario 1: uint8 image from imread (typical)")
print("  uint8 [0, 255] → / 255.0 → float [0, 1]")
print("  rgb2lab expects [0, 1] for float → CORRECT")
print()
print("Scenario 2: float image already in [0, 1] (edge case)")
print("  float [0, 1] → / 255.0 → float [0, 0.00392]")
print("  rgb2lab expects [0, 1] for float → WRONG (becomes almost black)")
EOF

Repository: RationAI/stain-normalization

Length of output: 189


🏁 Script executed:

# Look at how images are passed to the analyzer - check analysis_export.py and vector_metrics.py
cat stain_normalization/callbacks/analysis_export.py | head -80

Repository: RationAI/stain-normalization

Length of output: 2260


🏁 Script executed:

# Check tensor_to_image to understand image format conversions
grep -A 10 "tensor_to_image" stain_normalization/callbacks/tiles_export.py

Repository: RationAI/stain-normalization

Length of output: 1157


🏁 Script executed:

# Final check: are there any tests that show expected input types?
find . -name "*.py" | xargs grep -l "test.*compute_lab_brightness_psnr\|compute_lab_brightness_psnr.*test" 2>/dev/null

Repository: RationAI/stain-normalization

Length of output: 56


🏁 Script executed:

# Check requirements or setup to confirm scikit-image version expectations
cat setup.py pyproject.toml requirements.txt 2>/dev/null | grep -i scikit-image

Repository: RationAI/stain-normalization

Length of output: 99


Make input scaling conditional to handle both uint8 and normalized float images.

The current code unconditionally divides by 255 before rgb2lab. For typical uint8 images from loaders like PIL or OpenCV, this is correct. However, scikit-image's rgb2lab expects float inputs in the range [0, 1], so passing already-normalized floats (where max ≤ 1.0) will be incorrectly scaled and produce nearly black Lab values. Since this is a public API, add defensive scaling that checks the input range.

Proposed fix
+def _to_unit_rgb(img: np.ndarray[Any, Any]) -> np.ndarray[Any, Any]:
+    img = img.astype(np.float32)
+    return img if np.max(img) <= 1.0 else img / 255.0
+
 def compute_lab_brightness_psnr(
     img1: np.ndarray[Any, Any], img2: np.ndarray[Any, Any]
 ) -> float:
 ...
-    lab1 = rgb2lab(img1.astype(np.float32) / 255.0)
-    lab2 = rgb2lab(img2.astype(np.float32) / 255.0)
+    lab1 = rgb2lab(_to_unit_rgb(img1))
+    lab2 = rgb2lab(_to_unit_rgb(img2))
     return float(
         peak_signal_noise_ratio(lab1[:, :, 0], lab2[:, :, 0], data_range=100.0)
     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stain_normalization/metrics/image_metrics.py` around lines 58 - 61, The code
currently always divides img1/img2 by 255.0 before calling rgb2lab (variables
lab1, lab2 in the function using rgb2lab and peak_signal_noise_ratio), which
corrupts already-normalized float inputs; change to conditionally scale only
when needed by checking dtype or value range (e.g., if img.dtype is uint8 or
img.max() > 1.0 then divide by 255.0), apply the same check to both img1 and
img2 before calling rgb2lab, and then compute PSNR on the L channel as before
using peak_signal_noise_ratio.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a comprehensive set of features for model evaluation, including several new callbacks and a suite of image and stain vector metrics, with WSIAssembler using memory-mapped files for efficient image reconstruction. However, a potential path traversal vulnerability was identified in the TilesExport callback due to unsanitized user-controlled metadata in file paths. Additionally, high-severity issues such as non-robust error handling in StainAnalyzer and a potential integer overflow in WSIAssembler's tile blending logic were found, which could lead to crashes or silent data corruption. Addressing these security and reliability concerns will significantly improve the overall robustness of the new evaluation pipeline.

Comment on lines +111 to +123
assert (
ref_vectors is not None
) # refrence image deosnt have stain vectors (too much background)
img_vectors = estimate_stain_vectors(image)
vec_result = compare_vectors(ref_vectors, img_vectors)
result.update(vec_result)
img_vectors_paired = (
img_vectors[[1, 0]] if vec_result["was_swapped"] else img_vectors
)
for j, val in enumerate(ref_vectors.flatten()):
result[f"ref_vec_{j}"] = float(val)
for j, val in enumerate(img_vectors_paired.flatten()):
result[f"img_vec_{j}"] = float(val)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using assert is not robust for this check. Assertions can be disabled in production environments (with the -O flag), and they cause the program to crash if the condition is false. This is not ideal for a long-running analysis task. A similar crash will occur if estimate_stain_vectors(image) returns None. It would be more robust to handle these cases gracefully by checking for None and skipping the vector-based metrics for the problematic image pair. This prevents the entire analysis from failing on a single image. Additionally, the comment on line 113 contains typos.

Suggested change
assert (
ref_vectors is not None
) # refrence image deosnt have stain vectors (too much background)
img_vectors = estimate_stain_vectors(image)
vec_result = compare_vectors(ref_vectors, img_vectors)
result.update(vec_result)
img_vectors_paired = (
img_vectors[[1, 0]] if vec_result["was_swapped"] else img_vectors
)
for j, val in enumerate(ref_vectors.flatten()):
result[f"ref_vec_{j}"] = float(val)
for j, val in enumerate(img_vectors_paired.flatten()):
result[f"img_vec_{j}"] = float(val)
# Stain vectors can't be estimated if the image has too much background.
if ref_vectors is not None and (img_vectors := estimate_stain_vectors(image)) is not None:
vec_result = compare_vectors(ref_vectors, img_vectors)
result.update(vec_result)
img_vectors_paired = (
img_vectors[[1, 0]] if vec_result["was_swapped"] else img_vectors
)
for j, val in enumerate(ref_vectors.flatten()):
result[f"ref_vec_{j}"] = float(val)
for j, val in enumerate(img_vectors_paired.flatten()):
result[f"img_vec_{j}"] = float(val)

)
count_buf = np.memmap(
Path(tmp.name) / "count.raw",
dtype=np.uint8,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The count_buffer is created with dtype=np.uint8, which limits the overlap count per pixel to 255. If more than 255 tiles overlap on a single pixel, the count will wrap around to 0, leading to incorrect blending of tiles. While high overlap might be uncommon, it's possible depending on the tiling strategy. To make this more robust and prevent potential silent data corruption, consider using a larger integer type like np.uint16 for the count buffer.

Suggested change
dtype=np.uint8,
dtype=np.uint16,

Comment on lines +34 to +40
slide_name = data[b]["slide_name"]
xy = data[b]["xy"]

slide_dir = self.output_dir / slide_name
slide_dir.mkdir(parents=True, exist_ok=True)

self.tensor_to_image(outputs[b]).save(slide_dir / f"{xy}_predicted.png")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The TilesExport callback constructs file paths using slide_name and xy directly from the batch metadata without any sanitization or validation. If an attacker can control these values (e.g., through a malicious dataset or metadata file), they could perform a path traversal attack to write files outside the intended output_dir. For example, a slide_name of ../../ or an xy value containing ../ could allow writing to arbitrary locations on the filesystem where the process has permissions.

To remediate this, sanitize slide_name and xy before using them in path construction. A good approach is to use Path(value).name to ensure only the filename component is used, or to validate that the values match an expected pattern (e.g., alphanumeric for slide_name and integer_integer for xy). Note that WSIAssembler in this same PR already implements similar precautions by using .stem and a lookup table.

Comment on lines +61 to +67
slide_name = data[b]["slide_name"]
xy = data[b]["xy"]

slide_dir = self.output_dir / slide_name
slide_dir.mkdir(parents=True, exist_ok=True)

self.tensor_to_image(outputs[b]).save(slide_dir / f"{xy}.png")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

Similar to the issue in on_test_batch_end, the on_predict_batch_end method also uses slide_name and xy directly from the batch metadata to construct file paths, leading to a potential path traversal vulnerability.

@LAdam-ix LAdam-ix closed this Mar 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant