feat: add demo and misc by LAdam-ix · Pull Request #6 · RationAI/stain-normalization

LAdam-ix · 2026-03-11T07:26:59Z

The rest includes a demo+README for BC and analyze_dataset for easier datasets analysis, to find out whether the optimization is actually needed and how much the model helps.

Summary by CodeRabbit

New Features
- Added a dataset analysis tool supporting reference-based and paired comparison workflows for stain analysis across slides with optional tile subsampling.
- Added a demo utility for loading, normalizing, and saving images with GPU/CPU support.
Chores
- Configured code quality tooling: linting, formatting, type checking, and pre-commit hooks.
- Updated project configuration and dependencies.
Documentation
- Updated README title formatting.

coderabbitai · 2026-03-11T07:27:13Z

📝 Walkthrough

Walkthrough

The pull request introduces project infrastructure and tooling configurations, including linting/formatting setup via Ruff and mypy, pre-commit hooks, and a project manifest via pyproject.toml. Two new Python scripts are added: analyze_dataset.py for stain analysis workflows and demo.py for running pre-trained stain normalization models. The README is updated with corrected capitalization.

Changes

Cohort / File(s)	Summary
Configuration & Tool Setup `.flake8`, `.mypy.ini`, `.ruff.toml`, `.pre-commit-config.yaml`, `pyproject.toml`	Introduces linting, formatting, type-checking, and pre-commit configurations. Defines project metadata, dependencies, development scripts, and tool configurations for code quality with Ruff, mypy, and commitizen.
Version Control & Documentation `.gitignore`, `README.md`	Adds comprehensive Python project ignore patterns and updates README title capitalization from "stain-normalization" to "Stain Normalization".
Analysis Utility `analyze_dataset.py`	New script for stain analysis workflows supporting reference mode (comparing tiles against a reference image) and paired mode (comparing corresponding tiles between two datasets). Includes tile loading, StainAnalyzer integration, progress tracking, and CSV output.
Demo Application `demo.py`	New utility class StainNormalizerDemo that loads pre-trained stain normalization models, performs image normalization/denormalization, and provides CLI for single-file or batch directory processing with configurable CPU/GPU device selection.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Hop along with configs in place,
Linting rules set at a swift pace,
Two scripts now run with analytical grace,
Stain normalization joins the race! ✨
From tiles to tensors, the journey's refined,
Infrastructure complete, a well-structured mind!

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The PR title 'feat: add demo and misc' is vague and does not clearly convey the actual changes. While it mentions 'demo', it uses 'misc' which is a non-descriptive term that obscures the real scope of work including configuration files, analyze_dataset script, and README updates.	Consider a more specific title like 'feat: add demo, analysis script, and project configuration' or 'feat: add StainNormalizerDemo and dataset analysis tools' to better reflect the comprehensive changes.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/ml-rest

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2026-03-11T07:27:45Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the project's development experience and usability by introducing a user-friendly demonstration for the stain normalization model. It also provides a powerful tool for analyzing and comparing dataset stain characteristics, which is crucial for assessing the model's performance and identifying areas for optimization. Furthermore, the integration of modern code quality tools ensures a consistent and maintainable codebase.

Highlights

Demo Script Added: A new demo.py script has been introduced, allowing users to easily run the stain normalization model on individual images or entire folders, with options for CPU/GPU usage.
Dataset Analysis Tool: A new analyze_dataset.py script provides functionality to compare stain properties of a dataset against a reference image or between two datasets, aiding in model evaluation.
Code Quality and Development Tools: The project now includes configurations for flake8, mypy, pre-commit, and ruff, establishing a robust development workflow for linting, formatting, and type checking.
Updated Documentation: The README.md has been significantly expanded to include a detailed project description, instructions for running the demo, and explanations of available arguments.

Changelog

.flake8
- Added configuration for flake8 linter, setting max-line-length and ignoring specific errors.
.gitignore
- Added a comprehensive .gitignore file to exclude common Python-related build artifacts, environments, and IDE files, along with project-specific data and demo outputs.
.mypy.ini
- Added MyPy configuration to enable strict type checking, ignore missing imports, and disable specific error codes.
.pre-commit-config.yaml
- Added pre-commit hooks for YAML syntax validation, commit message format enforcement, and integration of Ruff for linting and formatting.
.ruff.toml
- Added Ruff configuration for code formatting and linting, including specific rule selections (e.g., flake8-bugbear, isort, pydocstyle) and ignored rules (e.g., missing docstrings, commented-out code).
README.md
- Updated the project's README with a detailed description in Slovak, instructions for running the demo, explanations of attached data, and available command-line arguments.
analyze_dataset.py
- Added a new script to compare stain properties of a dataset against a reference image or between two datasets, supporting MLflow URIs and tile subsampling.
demo.py
- Added a new script to demonstrate the stain normalization model, including image loading, normalization, denormalization, and saving, with support for single images or directories.
pyproject.toml
- Updated project metadata, added new dependencies for rationai-mlkit, rationai-masks, rationai-tiling, rationai-staining, scikit-image, and openslide-bin. Also defined PDM scripts for linting, formatting, and MyPy checks.

Activity

LAdam-ix created this pull request to add a demo, dataset analysis script, and development tooling.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

Tento pull request pridáva demo, skript na analýzu a rôzne konfiguračné súbory. Moje pripomienky sa zameriavajú na opravu neplatných verzií v pre-commit-config.yaml, zosúladenie konfigurácie v .ruff.toml s projektom a niekoľko drobných vylepšení v README.md a python skriptoch pre lepšiu robustnosť a výkon.

gemini-code-assist · 2026-03-11T07:29:44Z

.pre-commit-config.yaml

+  - repo: https://github.com/pre-commit/pre-commit-hooks
+    rev: v5.0.0
+    hooks:
+      - id: check-yaml
+        args: [--unsafe]
+
+  - repo: https://github.com/commitizen-tools/commitizen
+    rev: v3.30.1
+    hooks:
+      - id: commitizen
+
+  - repo: https://github.com/astral-sh/ruff-pre-commit
+    rev: v0.7.3
+    hooks:


Niekoľko revízií pre pre-commit hooks sa zdá byť nesprávnych alebo odkazujú na neexistujúce/predbežné verzie. To pravdepodobne spôsobí zlyhanie pre-commit alebo jeho neočakávané správanie.

pre-commit-hooks (riadok 5): v5.0.0 nie je platný tag. Posledná stabilná verzia je v4.6.0.

commitizen (riadok 11): v3.30.1 nie je platný tag. Posledná stabilná verzia je v3.28.0.

ruff-pre-commit (riadok 16): v0.7.3 je predbežná verzia Ruff. Všeobecne sa odporúča používať stabilné verzie (napr. v0.5.0), aby sa predišlo neočakávaným problémom.

Prosím, aktualizujte tieto revízie na platné a stabilné tagy.

gemini-code-assist · 2026-03-11T07:29:44Z

.ruff.toml

@@ -0,0 +1,51 @@
+fix = true
+line-length = 88
+target-version = "py311"


target-version je nastavená na py311, ale pyproject.toml špecifikuje Python 3.12.5. Aby sa zabezpečilo, že opravy a kontroly Ruff sú kompatibilné s verziou Pythonu vášho projektu, toto by sa malo aktualizovať.

Suggested change

target-version = "py311"

target-version = "py312"

gemini-code-assist · 2026-03-11T07:29:45Z

.ruff.toml

+    "D",   # pydocstyle 
+]
+extend-ignore = [
+    "ERA001", # commented out code


V extend-select ste vybrali ERA (kontrola zakomentovaného kódu), ale hneď potom ste ignorovali jeho jediné pravidlo, ERA001. Tým sa kontrola v podstate vypne. Ponechanie zakomentovaného kódu sa vo všeobecnosti neodporúča, pretože zahlcuje kódovú základňu a môže zastarať. Je lepšie spoliehať sa na históriu verzovacieho systému. Zvážte odstránenie tohto riadku, aby ste kontrolu povolili.

gemini-code-assist · 2026-03-11T07:29:45Z

README.md

+pdm run python demo.py --input ./demo_data/modified
+```
+
+## Dostupné arguemnty:


V slove arguemnty je preklep. Správne má byť argumenty.

Suggested change

## Dostupné arguemnty:

## Dostupné argumenty:

gemini-code-assist · 2026-03-11T07:29:45Z

README.md

+## Dostupné arguemnty:
+- **input**: cesta k obrázku alebo priečinku s obrázkami na normalizáciu (default ./demo_data/modified)
+- **output**: priečinok, kam sa uložia normalizované obrázky (default ./demo_data)
+- **use_cpu**: defaultne nadstavené na použitie GPU ak je dostupná, avšak ak by nastali problémy odporúčam použivať iba CPU


V slove použivať chýba dĺžeň. Správne má byť používať.

Suggested change

- **use_cpu**: defaultne nadstavené na použitie GPU ak je dostupná, avšak ak by nastali problémy odporúčam použivať iba CPU

- **use_cpu**: defaultne nadstavené na použitie GPU ak je dostupná, avšak ak by nastali problémy odporúčam používať iba CPU

gemini-code-assist · 2026-03-11T07:29:45Z

analyze_dataset.py

+    comp_iter = iterate_tiles(comp_slides, comp_tiles)
+
+    for (_, orig_tile, image_id), (_, comp_tile, _) in tqdm(
+        zip(orig_iter, comp_iter, strict=False), total=len(orig_tiles)


Použitie strict=True s zip by tu bolo bezpečnejšie. Komentár na riadku 8 spomína predpoklad, že datasety sú párované. strict=True vyvolá ValueError, ak majú iterovateľné objekty rôzne dĺžky, čo pomáha presadiť tento predpoklad a včas odhaliť potenciálne nezhody v dátach.

Suggested change

zip(orig_iter, comp_iter, strict=False), total=len(orig_tiles)

zip(orig_iter, comp_iter, strict=True), total=len(orig_tiles)

gemini-code-assist · 2026-03-11T07:29:45Z

analyze_dataset.py

+    if analyzer is None:
+        return


Táto kontrola if analyzer is None: sa zdá byť nedosiahnuteľná. Logika vyššie zaisťuje, že sa buď zavolá run_reference_mode alebo run_paired_mode (čo inicializuje analyzer), alebo sa program ukončí cez parser.error(). Tento blok je možné bezpečne odstrániť.

gemini-code-assist · 2026-03-11T07:29:45Z

demo.py

+        mean = torch.tensor(self.MEAN).view(3, 1, 1).to(tensor.device)
+        std = torch.tensor(self.STD).view(3, 1, 1).to(tensor.device)
+        return tensor * std + mean


Pre zlepšenie výkonu sa vyhnite vytváraniu nových tenzorov pre mean a std pri každom volaní denormalize. Tieto môžu byť vytvorené raz v metóde __init__, presunuté na správne zariadenie a uložené ako atribúty inštancie (napr. self.mean_tensor, self.std_tensor).

V __init__ pridajte:

self.mean_tensor = torch.tensor(self.MEAN).view(3, 1, 1).to(self.device) self.std_tensor = torch.tensor(self.STD).view(3, 1, 1).to(self.device)

Potom je možné denormalize zjednodušiť.

Suggested change

mean = torch.tensor(self.MEAN).view(3, 1, 1).to(tensor.device)

std = torch.tensor(self.STD).view(3, 1, 1).to(tensor.device)

return tensor * std + mean

return tensor * self.std_tensor + self.mean_tensor

gemini-code-assist · 2026-03-11T07:29:45Z

pyproject.toml

+name = "stain-normalization"
+version = "0.1.0"
+authors = [{name = "Adam Lopatka"}]
+requires-python = "==3.12.5"


Pevné viazanie verzie Pythonu na konkrétnu patch verziu (==3.12.5) môže byť príliš obmedzujúce a môže spôsobiť problémy prispievateľom používajúcim mierne odlišnú patch verziu. Všeobecne je lepšie špecifikovať kompatibilný rozsah, napríklad requires-python = ">=3.12".

Suggested change

requires-python = "==3.12.5"

requires-python = ">=3.12"

coderabbitai

Actionable comments posted: 8

🧹 Nitpick comments (1)

demo.py (1)
28-32: torch.load() already defaults to weights_only=True in PyTorch 2.6+.

In PyTorch 2.6+, torch.load() defaults to weights_only=True, making the explicit parameter redundant for modern versions. However, if the code targets PyTorch 1.13–2.5 compatibility, explicitly setting weights_only=True provides consistent security across versions and helps avoid the arbitrary code execution risk from unpickling untrusted checkpoints.
Optional for compatibility
-        checkpoint = torch.load(self.CHECKPOINT_PATH, map_location=self.device)
+        checkpoint = torch.load(
+            self.CHECKPOINT_PATH, map_location=self.device, weights_only=True
+        )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@demo.py` around lines 28 - 32, The torch.load call that reads
self.CHECKPOINT_PATH should explicitly pass weights_only=True to ensure
consistent behavior and avoid unpickling arbitrary objects on older PyTorch
versions; update the torch.load(self.CHECKPOINT_PATH, map_location=self.device)
call to include weights_only=True (keeping map_location=self.device) and leave
the subsequent checkpoint handling logic (the "state_dict" branch and
self.model.load_state_dict calls) unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.gitignore:
- Around line 181-183: The .gitignore entry '/demo_*' incorrectly excludes the
demo_data directory referenced in README.md; update the .gitignore by removing
or narrowing the '/demo_*' pattern and add an explicit allow rule for demo_data
(e.g., add '!/demo_data/' and '!/demo_data/**') so the './demo_data/...' demo
assets can be committed; target the '/demo_*' line in .gitignore and the
README.md mention of './demo_data' to verify consistency.

In @.pre-commit-config.yaml:
- Around line 19-23: The pre-commit hooks replaced Ruff's default entrypoints
with `pdm lint` and `pdm format`, which breaks hooks because `pdm` is not
available in pre-commit's isolated environment; change the `id: ruff` and `id:
ruff-format` hook entries back to Ruff's default executables (`ruff` and
`ruff-format`) and move `--force-exclude` into the hook `args` array so the
flags are passed to Ruff without depending on `pdm` in the hook environment.

In `@analyze_dataset.py`:
- Around line 16-28: Reorder the imports in analyze_dataset.py into
standard-library, third-party, and local/project groups (each group
alphabetized) and ensure single blank lines between groups so Ruff/I001 passes;
specifically place argparse, collections.abc.Generator, pathlib.Path, and
typing.Any first, then third-party imports numpy, pandas, PIL.Image (from PIL
import Image), tqdm, and
rationai.mlkit.data.datasets.MetaTiledSlides/OpenSlideTilesDataset, and finally
local package imports stain_normalization.analysis.StainAnalyzer and
stain_normalization.analysis.report.REPORT_METRICS; run isort/ruff to verify
formatting and remove any unused imports if flagged.
- Around line 90-102: The code silently pairs tiles by iteration order which can
misalign data; instead ensure explicit matching by a unique key and validate
equality before comparing: reindex comp_tiles to orig_tiles using comp_tiles =
comp_tiles.reindex(orig_tiles.index) (or perform a merge/join on the tile
identifier column) and check that no NaNs were introduced and that
len(comp_tiles) == len(orig_tiles) (raise an error or exit if there are
missing/mismatched rows); then create iterators via iterate_tiles(orig_slides,
orig_tiles) and iterate_tiles(comp_slides, comp_tiles) and remove zip(...,
strict=False) so you iterate only aligned rows when calling
analyzer.compare(comp_tile, image_id=image_id, reference=orig_tile).

In `@demo.py`:
- Around line 1-11: The imports in demo.py are unsorted (ruff I001); reorder
them into the configured import groups and alphabetic order (standard library,
third-party, local) so ruff lint passes — i.e., ensure argparse and pathlib.Path
come first, then third-party imports like albumentations (A),
albumentations.pytorch.ToTensorV2, numpy (np), PIL.Image, torch, and finally the
local StainNormalizationModel import; you can run `ruff check --fix demo.py` or
`ruff format demo.py` to apply the correct ordering automatically.

In `@pyproject.toml`:
- Around line 30-33: The package entrypoints 'train', 'validate', 'test', and
'predict' currently run "python -m stain_normalization" but this repo doesn't
contain a stain_normalization module; either add that package/module or update
those scripts in pyproject.toml to point to the real CLI module that exists in
this checkout (replace "stain_normalization" with the correct module name),
ensuring the module exposes the expected mode argument handlers used by the
project.
- Line 11: The pyproject.toml dependency line for "torchmetrics" uses an invalid
constraint "torchmetrics>=1.4.14"; update the constraint to a valid PyPI version
(e.g., "torchmetrics>=1.4.3" or a newer official release) so dependency
resolution succeeds—edit the "torchmetrics" entry in pyproject.toml to replace
">=1.4.14" with a valid version specifier.

In `@README.md`:
- Around line 31-34: Fix the typos in the CLI flags section header and
description: change the header "Dostupné arguemnty:" to "Dostupné argumenty:"
and correct "nadstavené" to "nastavené" in the line describing "use_cpu"; keep
the existing flag names (input, output, use_cpu) and their default values intact
so only spelling is updated.

---

Nitpick comments:
In `@demo.py`:
- Around line 28-32: The torch.load call that reads self.CHECKPOINT_PATH should
explicitly pass weights_only=True to ensure consistent behavior and avoid
unpickling arbitrary objects on older PyTorch versions; update the
torch.load(self.CHECKPOINT_PATH, map_location=self.device) call to include
weights_only=True (keeping map_location=self.device) and leave the subsequent
checkpoint handling logic (the "state_dict" branch and
self.model.load_state_dict calls) unchanged.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4eca22a1-4ca7-4ee5-a739-3f7bfb33de67

📥 Commits

Reviewing files that changed from the base of the PR and between 0ce77d2 and 9f814c4.

⛔ Files ignored due to path filters (2)

pdm.lock is excluded by !**/*.lock
uv.lock is excluded by !**/*.lock

📒 Files selected for processing (9)

.flake8
.gitignore
.mypy.ini
.pre-commit-config.yaml
.ruff.toml
README.md
analyze_dataset.py
demo.py
pyproject.toml

coderabbitai · 2026-03-11T07:35:13Z

.gitignore

+# Project
+/data
+/demo_*


⚠️ Potential issue | 🟠 Major

/demo_* conflicts with the documented demo assets.

README.md tells users to run the demo from ./demo_data/..., but this pattern ignores that directory. That prevents the sample inputs/checkpoint from being added or updated in git, which undercuts the demo flow this PR introduces.

Suggested fix

# Project /data /demo_* +!/demo_data/ +!/demo_data/**

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.gitignore around lines 181 - 183, The .gitignore entry '/demo_*' incorrectly excludes the demo_data directory referenced in README.md; update the .gitignore by removing or narrowing the '/demo_*' pattern and add an explicit allow rule for demo_data (e.g., add '!/demo_data/' and '!/demo_data/**') so the './demo_data/...' demo assets can be committed; target the '/demo_*' line in .gitignore and the README.md mention of './demo_data' to verify consistency.

coderabbitai · 2026-03-11T07:35:13Z

.pre-commit-config.yaml

+      - id: ruff
+        entry: pdm lint --force-exclude
+      # Run the formatter.
+      - id: ruff-format
+        entry: pdm format --force-exclude


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail curl -fsSL https://raw.githubusercontent.com/astral-sh/ruff-pre-commit/v0.7.3/.pre-commit-hooks.yaml \ | sed -n '/^- id: ruff/,+12p;/^- id: ruff-format/,+8p'

Repository: RationAI/stain-normalization

Length of output: 653

Keep the Ruff hooks on Ruff's default entrypoint.

Replacing the hook entry with pdm lint and pdm format makes the hook depend on pdm being available inside pre-commit's isolated hook environment, where it typically isn't available. This causes hook execution to fail before Ruff even runs. Use the default Ruff entrypoints and pass tool-specific flags via args instead.

Suggested fix

- repo: https://github.com/astral-sh/ruff-pre-commit rev: v0.7.3 hooks: # Run the linter. - id: ruff - entry: pdm lint --force-exclude + args: [--fix, --force-exclude] # Run the formatter. - id: ruff-format - entry: pdm format --force-exclude + args: [--force-exclude]

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- id: ruff

entry: pdm lint --force-exclude

# Run the formatter.

- id: ruff-format

entry: pdm format --force-exclude

- id: ruff

args: [--fix, --force-exclude]

# Run the formatter.

- id: ruff-format

args: [--force-exclude]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.pre-commit-config.yaml around lines 19 - 23, The pre-commit hooks replaced Ruff's default entrypoints with `pdm lint` and `pdm format`, which breaks hooks because `pdm` is not available in pre-commit's isolated environment; change the `id: ruff` and `id: ruff-format` hook entries back to Ruff's default executables (`ruff` and `ruff-format`) and move `--force-exclude` into the hook `args` array so the flags are passed to Ruff without depending on `pdm` in the hook environment.

coderabbitai · 2026-03-11T07:35:13Z

analyze_dataset.py

+import argparse
+from collections.abc import Generator
+from pathlib import Path
+from typing import Any
+
+import numpy as np
+import pandas as pd
+from PIL import Image
+from rationai.mlkit.data.datasets import MetaTiledSlides, OpenSlideTilesDataset
+from tqdm import tqdm
+
+from stain_normalization.analysis import StainAnalyzer
+from stain_normalization.analysis.report import REPORT_METRICS


⚠️ Potential issue | 🟡 Minor

Sort this import block so lint passes.

The pipeline is already failing with Ruff I001 on this block, so this script cannot merge cleanly until the imports are re-ordered/formatted.

🧰 Tools

🪛 GitHub Actions: Python Lint (RationAI Standard)

[error] 16-16: I001 Import block is unsorted or un-formatted. Organize imports (ruff).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@analyze_dataset.py` around lines 16 - 28, Reorder the imports in analyze_dataset.py into standard-library, third-party, and local/project groups (each group alphabetized) and ensure single blank lines between groups so Ruff/I001 passes; specifically place argparse, collections.abc.Generator, pathlib.Path, and typing.Any first, then third-party imports numpy, pandas, PIL.Image (from PIL import Image), tqdm, and rationai.mlkit.data.datasets.MetaTiledSlides/OpenSlideTilesDataset, and finally local package imports stain_normalization.analysis.StainAnalyzer and stain_normalization.analysis.report.REPORT_METRICS; run isort/ruff to verify formatting and remove any unused imports if flagged.

coderabbitai · 2026-03-11T07:35:13Z

analyze_dataset.py

+    if args.max_tiles and len(orig_tiles) > args.max_tiles:
+        orig_tiles = orig_tiles.sample(n=args.max_tiles, random_state=42)
+        comp_tiles = comp_tiles.loc[orig_tiles.index]
+        print(f"Subsampled to {args.max_tiles} tile pairs")
+
+    analyzer = StainAnalyzer()
+    orig_iter = iterate_tiles(orig_slides, orig_tiles)
+    comp_iter = iterate_tiles(comp_slides, comp_tiles)
+
+    for (_, orig_tile, image_id), (_, comp_tile, _) in tqdm(
+        zip(orig_iter, comp_iter, strict=False), total=len(orig_tiles)
+    ):
+        analyzer.compare(comp_tile, image_id=image_id, reference=orig_tile)


⚠️ Potential issue | 🟠 Major

Don't silently pair tiles by iteration order.

comp_tiles = comp_tiles.loc[orig_tiles.index] and zip(..., strict=False) assume both datasets have identical indexes and ordering. Any reordering or missing row silently truncates or misaligns pairs, which corrupts the reported metrics.

Minimal guard

- for (_, orig_tile, image_id), (_, comp_tile, _) in tqdm( - zip(orig_iter, comp_iter, strict=False), total=len(orig_tiles) - ): + for (orig_slide, orig_tile, image_id), ( + comp_slide, + comp_tile, + comp_image_id, + ) in tqdm(zip(orig_iter, comp_iter, strict=True), total=len(orig_tiles)): + if (orig_slide, image_id) != (comp_slide, comp_image_id): + raise ValueError("Original and compared datasets are not aligned") analyzer.compare(comp_tile, image_id=image_id, reference=orig_tile)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@analyze_dataset.py` around lines 90 - 102, The code silently pairs tiles by iteration order which can misalign data; instead ensure explicit matching by a unique key and validate equality before comparing: reindex comp_tiles to orig_tiles using comp_tiles = comp_tiles.reindex(orig_tiles.index) (or perform a merge/join on the tile identifier column) and check that no NaNs were introduced and that len(comp_tiles) == len(orig_tiles) (raise an error or exit if there are missing/mismatched rows); then create iterators via iterate_tiles(orig_slides, orig_tiles) and iterate_tiles(comp_slides, comp_tiles) and remove zip(..., strict=False) so you iterate only aligned rows when calling analyzer.compare(comp_tile, image_id=image_id, reference=orig_tile).

coderabbitai · 2026-03-11T07:35:13Z

demo.py

+import argparse
+from pathlib import Path
+
+import albumentations as A
+import numpy as np
+import torch
+from albumentations.pytorch import ToTensorV2
+from PIL import Image
+
+from stain_normalization.stain_normalization_model import StainNormalizationModel
+


⚠️ Potential issue | 🟡 Minor

Fix import sorting to pass linting.

The pipeline is failing with ruff error I001 indicating imports are unsorted. Run ruff check --fix or ruff format to auto-fix.

Suggested fix

import argparse from pathlib import Path import albumentations as A -import numpy as np -import torch from albumentations.pytorch import ToTensorV2 +import numpy as np from PIL import Image +import torch from stain_normalization.stain_normalization_model import StainNormalizationModel

Note: The exact ordering depends on your ruff configuration. Running ruff check --fix demo.py will apply the correct order automatically.

🧰 Tools

🪛 GitHub Actions: Python Lint (RationAI Standard)

[error] 1-1: I001 Import block is unsorted or un-formatted. Organize imports (ruff).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@demo.py` around lines 1 - 11, The imports in demo.py are unsorted (ruff I001); reorder them into the configured import groups and alphabetic order (standard library, third-party, local) so ruff lint passes — i.e., ensure argparse and pathlib.Path come first, then third-party imports like albumentations (A), albumentations.pytorch.ToTensorV2, numpy (np), PIL.Image, torch, and finally the local StainNormalizationModel import; you can run `ruff check --fix demo.py` or `ruff format demo.py` to apply the correct ordering automatically.

coderabbitai · 2026-03-11T07:35:13Z

pyproject.toml

+dependencies = [
+    "lightning>=2.0.0",
+    "albumentations>=1.4.14",
+    "torchmetrics>=1.4.14",


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail python - <<'PY' import json import urllib.request with urllib.request.urlopen("https://pypi.org/pypi/torchmetrics/json") as resp: data = json.load(resp) releases = sorted(data["releases"]) print("1.4.14 exists:", "1.4.14" in data["releases"]) print("Recent 1.4.x releases:", [v for v in releases if v.startswith("1.4")][-10:]) PY

Repository: RationAI/stain-normalization

Length of output: 2653

🌐 Web query:

torchmetrics 1.4.14 PyPI release version

💡 Result:

torchmetrics==1.4.14 is not a version published on PyPI. The 1.4.x releases on PyPI are 1.4.0, 1.4.0.post0, 1.4.1, 1.4.2, and 1.4.3. The current latest release shown on PyPI is 1.9.0 (Mar 9, 2026). [1]

Source(s):

[1] PyPI “torchmetrics” release history (shows available versions and dates)

Fix torchmetrics version constraint—1.4.14 does not exist on PyPI.

The constraint torchmetrics>=1.4.14 is unsatisfiable. PyPI only has torchmetrics 1.4.0, 1.4.0.post0, 1.4.1, 1.4.2, and 1.4.3 in the 1.4.x series. This will cause pdm install to fail during dependency resolution. Change this to a valid version constraint, such as torchmetrics>=1.4.3 or update to a more recent release.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pyproject.toml` at line 11, The pyproject.toml dependency line for "torchmetrics" uses an invalid constraint "torchmetrics>=1.4.14"; update the constraint to a valid PyPI version (e.g., "torchmetrics>=1.4.3" or a newer official release) so dependency resolution succeeds—edit the "torchmetrics" entry in pyproject.toml to replace ">=1.4.14" with a valid version specifier.

coderabbitai · 2026-03-11T07:35:13Z

pyproject.toml

+train = "python -m stain_normalization mode=fit"
+validate = "python -m stain_normalization mode=validate"
+test = "python -m stain_normalization mode=test"
+predict = "python -m stain_normalization mode=predict"


⚠️ Potential issue | 🔴 Critical

These scripts point at a module the checkout doesn't contain.

train / validate / test / predict all call python -m stain_normalization, but this PR doesn't add that package. In this state those entrypoints fail before any app code runs.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pyproject.toml` around lines 30 - 33, The package entrypoints 'train', 'validate', 'test', and 'predict' currently run "python -m stain_normalization" but this repo doesn't contain a stain_normalization module; either add that package/module or update those scripts in pyproject.toml to point to the real CLI module that exists in this checkout (replace "stain_normalization" with the correct module name), ensuring the module exposes the expected mode argument handlers used by the project.

coderabbitai · 2026-03-11T07:35:13Z

README.md

+## Dostupné arguemnty:
+- **input**: cesta k obrázku alebo priečinku s obrázkami na normalizáciu (default ./demo_data/modified)
+- **output**: priečinok, kam sa uložia normalizované obrázky (default ./demo_data)
+- **use_cpu**: defaultne nadstavené na použitie GPU ak je dostupná, avšak ak by nastali problémy odporúčam použivať iba CPU


⚠️ Potential issue | 🟡 Minor

Fix the CLI section typos.

arguemnty and nadstavené are user-facing typos in the only section that explains the flags, so the README reads unfinished right where users are expected to follow it.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@README.md` around lines 31 - 34, Fix the typos in the CLI flags section header and description: change the header "Dostupné arguemnty:" to "Dostupné argumenty:" and correct "nadstavené" to "nastavené" in the line describing "use_cpu"; keep the existing flag names (input, output, use_cpu) and their default values intact so only spelling is updated.

feat: add demo and misc

9f814c4

LAdam-ix requested a review from matejpekar March 11, 2026 07:26

LAdam-ix self-assigned this Mar 11, 2026

LAdam-ix requested review from a team and JakubPekar March 11, 2026 07:27

gemini-code-assist bot reviewed Mar 11, 2026

View reviewed changes

coderabbitai bot reviewed Mar 11, 2026

View reviewed changes

LAdam-ix closed this Mar 11, 2026

	- use_cpu: defaultne nadstavené na použitie GPU ak je dostupná, avšak ak by nastali problémy odporúčam použivať iba CPU
	- use_cpu: defaultne nadstavené na použitie GPU ak je dostupná, avšak ak by nastali problémy odporúčam používať iba CPU

	zip(orig_iter, comp_iter, strict=False), total=len(orig_tiles)
	zip(orig_iter, comp_iter, strict=True), total=len(orig_tiles)

Conversation

LAdam-ix commented Mar 11, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

gemini-code-assist bot commented Mar 11, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

LAdam-ix commented Mar 11, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 11, 2026 •

edited

Loading