Skip to content

feat: add demo and misc#6

Closed
LAdam-ix wants to merge 1 commit intomainfrom
feature/ml-rest
Closed

feat: add demo and misc#6
LAdam-ix wants to merge 1 commit intomainfrom
feature/ml-rest

Conversation

@LAdam-ix
Copy link
Collaborator

@LAdam-ix LAdam-ix commented Mar 11, 2026

The rest includes a demo+README for BC and analyze_dataset for easier datasets analysis, to find out whether the optimization is actually needed and how much the model helps.

Summary by CodeRabbit

  • New Features

    • Added a dataset analysis tool supporting reference-based and paired comparison workflows for stain analysis across slides with optional tile subsampling.
    • Added a demo utility for loading, normalizing, and saving images with GPU/CPU support.
  • Chores

    • Configured code quality tooling: linting, formatting, type checking, and pre-commit hooks.
    • Updated project configuration and dependencies.
  • Documentation

    • Updated README title formatting.

@LAdam-ix LAdam-ix requested a review from matejpekar March 11, 2026 07:26
@LAdam-ix LAdam-ix self-assigned this Mar 11, 2026
@LAdam-ix LAdam-ix requested review from a team and JakubPekar March 11, 2026 07:27
@coderabbitai
Copy link

coderabbitai bot commented Mar 11, 2026

📝 Walkthrough

Walkthrough

The pull request introduces project infrastructure and tooling configurations, including linting/formatting setup via Ruff and mypy, pre-commit hooks, and a project manifest via pyproject.toml. Two new Python scripts are added: analyze_dataset.py for stain analysis workflows and demo.py for running pre-trained stain normalization models. The README is updated with corrected capitalization.

Changes

Cohort / File(s) Summary
Configuration & Tool Setup
.flake8, .mypy.ini, .ruff.toml, .pre-commit-config.yaml, pyproject.toml
Introduces linting, formatting, type-checking, and pre-commit configurations. Defines project metadata, dependencies, development scripts, and tool configurations for code quality with Ruff, mypy, and commitizen.
Version Control & Documentation
.gitignore, README.md
Adds comprehensive Python project ignore patterns and updates README title capitalization from "stain-normalization" to "Stain Normalization".
Analysis Utility
analyze_dataset.py
New script for stain analysis workflows supporting reference mode (comparing tiles against a reference image) and paired mode (comparing corresponding tiles between two datasets). Includes tile loading, StainAnalyzer integration, progress tracking, and CSV output.
Demo Application
demo.py
New utility class StainNormalizerDemo that loads pre-trained stain normalization models, performs image normalization/denormalization, and provides CLI for single-file or batch directory processing with configurable CPU/GPU device selection.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Hop along with configs in place,
Linting rules set at a swift pace,
Two scripts now run with analytical grace,
Stain normalization joins the race!
From tiles to tensors, the journey's refined,
Infrastructure complete, a well-structured mind!

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The PR title 'feat: add demo and misc' is vague and does not clearly convey the actual changes. While it mentions 'demo', it uses 'misc' which is a non-descriptive term that obscures the real scope of work including configuration files, analyze_dataset script, and README updates. Consider a more specific title like 'feat: add demo, analysis script, and project configuration' or 'feat: add StainNormalizerDemo and dataset analysis tools' to better reflect the comprehensive changes.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/ml-rest

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the project's development experience and usability by introducing a user-friendly demonstration for the stain normalization model. It also provides a powerful tool for analyzing and comparing dataset stain characteristics, which is crucial for assessing the model's performance and identifying areas for optimization. Furthermore, the integration of modern code quality tools ensures a consistent and maintainable codebase.

Highlights

  • Demo Script Added: A new demo.py script has been introduced, allowing users to easily run the stain normalization model on individual images or entire folders, with options for CPU/GPU usage.
  • Dataset Analysis Tool: A new analyze_dataset.py script provides functionality to compare stain properties of a dataset against a reference image or between two datasets, aiding in model evaluation.
  • Code Quality and Development Tools: The project now includes configurations for flake8, mypy, pre-commit, and ruff, establishing a robust development workflow for linting, formatting, and type checking.
  • Updated Documentation: The README.md has been significantly expanded to include a detailed project description, instructions for running the demo, and explanations of available arguments.
Changelog
  • .flake8
    • Added configuration for flake8 linter, setting max-line-length and ignoring specific errors.
  • .gitignore
    • Added a comprehensive .gitignore file to exclude common Python-related build artifacts, environments, and IDE files, along with project-specific data and demo outputs.
  • .mypy.ini
    • Added MyPy configuration to enable strict type checking, ignore missing imports, and disable specific error codes.
  • .pre-commit-config.yaml
    • Added pre-commit hooks for YAML syntax validation, commit message format enforcement, and integration of Ruff for linting and formatting.
  • .ruff.toml
    • Added Ruff configuration for code formatting and linting, including specific rule selections (e.g., flake8-bugbear, isort, pydocstyle) and ignored rules (e.g., missing docstrings, commented-out code).
  • README.md
    • Updated the project's README with a detailed description in Slovak, instructions for running the demo, explanations of attached data, and available command-line arguments.
  • analyze_dataset.py
    • Added a new script to compare stain properties of a dataset against a reference image or between two datasets, supporting MLflow URIs and tile subsampling.
  • demo.py
    • Added a new script to demonstrate the stain normalization model, including image loading, normalization, denormalization, and saving, with support for single images or directories.
  • pyproject.toml
    • Updated project metadata, added new dependencies for rationai-mlkit, rationai-masks, rationai-tiling, rationai-staining, scikit-image, and openslide-bin. Also defined PDM scripts for linting, formatting, and MyPy checks.
Activity
  • LAdam-ix created this pull request to add a demo, dataset analysis script, and development tooling.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Tento pull request pridáva demo, skript na analýzu a rôzne konfiguračné súbory. Moje pripomienky sa zameriavajú na opravu neplatných verzií v pre-commit-config.yaml, zosúladenie konfigurácie v .ruff.toml s projektom a niekoľko drobných vylepšení v README.md a python skriptoch pre lepšiu robustnosť a výkon.

Comment on lines +4 to +17
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v5.0.0
hooks:
- id: check-yaml
args: [--unsafe]

- repo: https://github.com/commitizen-tools/commitizen
rev: v3.30.1
hooks:
- id: commitizen

- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.7.3
hooks:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Niekoľko revízií pre pre-commit hooks sa zdá byť nesprávnych alebo odkazujú na neexistujúce/predbežné verzie. To pravdepodobne spôsobí zlyhanie pre-commit alebo jeho neočakávané správanie.

  • pre-commit-hooks (riadok 5): v5.0.0 nie je platný tag. Posledná stabilná verzia je v4.6.0.
  • commitizen (riadok 11): v3.30.1 nie je platný tag. Posledná stabilná verzia je v3.28.0.
  • ruff-pre-commit (riadok 16): v0.7.3 je predbežná verzia Ruff. Všeobecne sa odporúča používať stabilné verzie (napr. v0.5.0), aby sa predišlo neočakávaným problémom.

Prosím, aktualizujte tieto revízie na platné a stabilné tagy.

@@ -0,0 +1,51 @@
fix = true
line-length = 88
target-version = "py311"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

target-version je nastavená na py311, ale pyproject.toml špecifikuje Python 3.12.5. Aby sa zabezpečilo, že opravy a kontroly Ruff sú kompatibilné s verziou Pythonu vášho projektu, toto by sa malo aktualizovať.

Suggested change
target-version = "py311"
target-version = "py312"

"D", # pydocstyle
]
extend-ignore = [
"ERA001", # commented out code

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

V extend-select ste vybrali ERA (kontrola zakomentovaného kódu), ale hneď potom ste ignorovali jeho jediné pravidlo, ERA001. Tým sa kontrola v podstate vypne. Ponechanie zakomentovaného kódu sa vo všeobecnosti neodporúča, pretože zahlcuje kódovú základňu a môže zastarať. Je lepšie spoliehať sa na históriu verzovacieho systému. Zvážte odstránenie tohto riadku, aby ste kontrolu povolili.

pdm run python demo.py --input ./demo_data/modified
```

## Dostupné arguemnty:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

V slove arguemnty je preklep. Správne má byť argumenty.

Suggested change
## Dostupné arguemnty:
## Dostupné argumenty:

## Dostupné arguemnty:
- **input**: cesta k obrázku alebo priečinku s obrázkami na normalizáciu (default ./demo_data/modified)
- **output**: priečinok, kam sa uložia normalizované obrázky (default ./demo_data)
- **use_cpu**: defaultne nadstavené na použitie GPU ak je dostupná, avšak ak by nastali problémy odporúčam použivať iba CPU No newline at end of file

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

V slove použivať chýba dĺžeň. Správne má byť používať.

Suggested change
- **use_cpu**: defaultne nadstavené na použitie GPU ak je dostupná, avšak ak by nastali problémy odporúčam použivať iba CPU
- **use_cpu**: defaultne nadstavené na použitie GPU ak je dostupná, avšak ak by nastali problémy odporúčam používať iba CPU

comp_iter = iterate_tiles(comp_slides, comp_tiles)

for (_, orig_tile, image_id), (_, comp_tile, _) in tqdm(
zip(orig_iter, comp_iter, strict=False), total=len(orig_tiles)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Použitie strict=True s zip by tu bolo bezpečnejšie. Komentár na riadku 8 spomína predpoklad, že datasety sú párované. strict=True vyvolá ValueError, ak majú iterovateľné objekty rôzne dĺžky, čo pomáha presadiť tento predpoklad a včas odhaliť potenciálne nezhody v dátach.

Suggested change
zip(orig_iter, comp_iter, strict=False), total=len(orig_tiles)
zip(orig_iter, comp_iter, strict=True), total=len(orig_tiles)

Comment on lines +142 to +143
if analyzer is None:
return

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Táto kontrola if analyzer is None: sa zdá byť nedosiahnuteľná. Logika vyššie zaisťuje, že sa buď zavolá run_reference_mode alebo run_paired_mode (čo inicializuje analyzer), alebo sa program ukončí cez parser.error(). Tento blok je možné bezpečne odstrániť.

Comment on lines +44 to +46
mean = torch.tensor(self.MEAN).view(3, 1, 1).to(tensor.device)
std = torch.tensor(self.STD).view(3, 1, 1).to(tensor.device)
return tensor * std + mean

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Pre zlepšenie výkonu sa vyhnite vytváraniu nových tenzorov pre mean a std pri každom volaní denormalize. Tieto môžu byť vytvorené raz v metóde __init__, presunuté na správne zariadenie a uložené ako atribúty inštancie (napr. self.mean_tensor, self.std_tensor).

V __init__ pridajte:

self.mean_tensor = torch.tensor(self.MEAN).view(3, 1, 1).to(self.device)
self.std_tensor = torch.tensor(self.STD).view(3, 1, 1).to(self.device)

Potom je možné denormalize zjednodušiť.

Suggested change
mean = torch.tensor(self.MEAN).view(3, 1, 1).to(tensor.device)
std = torch.tensor(self.STD).view(3, 1, 1).to(tensor.device)
return tensor * std + mean
return tensor * self.std_tensor + self.mean_tensor

name = "stain-normalization"
version = "0.1.0"
authors = [{name = "Adam Lopatka"}]
requires-python = "==3.12.5"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Pevné viazanie verzie Pythonu na konkrétnu patch verziu (==3.12.5) môže byť príliš obmedzujúce a môže spôsobiť problémy prispievateľom používajúcim mierne odlišnú patch verziu. Všeobecne je lepšie špecifikovať kompatibilný rozsah, napríklad requires-python = ">=3.12".

Suggested change
requires-python = "==3.12.5"
requires-python = ">=3.12"

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (1)
demo.py (1)

28-32: torch.load() already defaults to weights_only=True in PyTorch 2.6+.

In PyTorch 2.6+, torch.load() defaults to weights_only=True, making the explicit parameter redundant for modern versions. However, if the code targets PyTorch 1.13–2.5 compatibility, explicitly setting weights_only=True provides consistent security across versions and helps avoid the arbitrary code execution risk from unpickling untrusted checkpoints.

Optional for compatibility
-        checkpoint = torch.load(self.CHECKPOINT_PATH, map_location=self.device)
+        checkpoint = torch.load(
+            self.CHECKPOINT_PATH, map_location=self.device, weights_only=True
+        )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@demo.py` around lines 28 - 32, The torch.load call that reads
self.CHECKPOINT_PATH should explicitly pass weights_only=True to ensure
consistent behavior and avoid unpickling arbitrary objects on older PyTorch
versions; update the torch.load(self.CHECKPOINT_PATH, map_location=self.device)
call to include weights_only=True (keeping map_location=self.device) and leave
the subsequent checkpoint handling logic (the "state_dict" branch and
self.model.load_state_dict calls) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.gitignore:
- Around line 181-183: The .gitignore entry '/demo_*' incorrectly excludes the
demo_data directory referenced in README.md; update the .gitignore by removing
or narrowing the '/demo_*' pattern and add an explicit allow rule for demo_data
(e.g., add '!/demo_data/' and '!/demo_data/**') so the './demo_data/...' demo
assets can be committed; target the '/demo_*' line in .gitignore and the
README.md mention of './demo_data' to verify consistency.

In @.pre-commit-config.yaml:
- Around line 19-23: The pre-commit hooks replaced Ruff's default entrypoints
with `pdm lint` and `pdm format`, which breaks hooks because `pdm` is not
available in pre-commit's isolated environment; change the `id: ruff` and `id:
ruff-format` hook entries back to Ruff's default executables (`ruff` and
`ruff-format`) and move `--force-exclude` into the hook `args` array so the
flags are passed to Ruff without depending on `pdm` in the hook environment.

In `@analyze_dataset.py`:
- Around line 16-28: Reorder the imports in analyze_dataset.py into
standard-library, third-party, and local/project groups (each group
alphabetized) and ensure single blank lines between groups so Ruff/I001 passes;
specifically place argparse, collections.abc.Generator, pathlib.Path, and
typing.Any first, then third-party imports numpy, pandas, PIL.Image (from PIL
import Image), tqdm, and
rationai.mlkit.data.datasets.MetaTiledSlides/OpenSlideTilesDataset, and finally
local package imports stain_normalization.analysis.StainAnalyzer and
stain_normalization.analysis.report.REPORT_METRICS; run isort/ruff to verify
formatting and remove any unused imports if flagged.
- Around line 90-102: The code silently pairs tiles by iteration order which can
misalign data; instead ensure explicit matching by a unique key and validate
equality before comparing: reindex comp_tiles to orig_tiles using comp_tiles =
comp_tiles.reindex(orig_tiles.index) (or perform a merge/join on the tile
identifier column) and check that no NaNs were introduced and that
len(comp_tiles) == len(orig_tiles) (raise an error or exit if there are
missing/mismatched rows); then create iterators via iterate_tiles(orig_slides,
orig_tiles) and iterate_tiles(comp_slides, comp_tiles) and remove zip(...,
strict=False) so you iterate only aligned rows when calling
analyzer.compare(comp_tile, image_id=image_id, reference=orig_tile).

In `@demo.py`:
- Around line 1-11: The imports in demo.py are unsorted (ruff I001); reorder
them into the configured import groups and alphabetic order (standard library,
third-party, local) so ruff lint passes — i.e., ensure argparse and pathlib.Path
come first, then third-party imports like albumentations (A),
albumentations.pytorch.ToTensorV2, numpy (np), PIL.Image, torch, and finally the
local StainNormalizationModel import; you can run `ruff check --fix demo.py` or
`ruff format demo.py` to apply the correct ordering automatically.

In `@pyproject.toml`:
- Around line 30-33: The package entrypoints 'train', 'validate', 'test', and
'predict' currently run "python -m stain_normalization" but this repo doesn't
contain a stain_normalization module; either add that package/module or update
those scripts in pyproject.toml to point to the real CLI module that exists in
this checkout (replace "stain_normalization" with the correct module name),
ensuring the module exposes the expected mode argument handlers used by the
project.
- Line 11: The pyproject.toml dependency line for "torchmetrics" uses an invalid
constraint "torchmetrics>=1.4.14"; update the constraint to a valid PyPI version
(e.g., "torchmetrics>=1.4.3" or a newer official release) so dependency
resolution succeeds—edit the "torchmetrics" entry in pyproject.toml to replace
">=1.4.14" with a valid version specifier.

In `@README.md`:
- Around line 31-34: Fix the typos in the CLI flags section header and
description: change the header "Dostupné arguemnty:" to "Dostupné argumenty:"
and correct "nadstavené" to "nastavené" in the line describing "use_cpu"; keep
the existing flag names (input, output, use_cpu) and their default values intact
so only spelling is updated.

---

Nitpick comments:
In `@demo.py`:
- Around line 28-32: The torch.load call that reads self.CHECKPOINT_PATH should
explicitly pass weights_only=True to ensure consistent behavior and avoid
unpickling arbitrary objects on older PyTorch versions; update the
torch.load(self.CHECKPOINT_PATH, map_location=self.device) call to include
weights_only=True (keeping map_location=self.device) and leave the subsequent
checkpoint handling logic (the "state_dict" branch and
self.model.load_state_dict calls) unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4eca22a1-4ca7-4ee5-a739-3f7bfb33de67

📥 Commits

Reviewing files that changed from the base of the PR and between 0ce77d2 and 9f814c4.

⛔ Files ignored due to path filters (2)
  • pdm.lock is excluded by !**/*.lock
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (9)
  • .flake8
  • .gitignore
  • .mypy.ini
  • .pre-commit-config.yaml
  • .ruff.toml
  • README.md
  • analyze_dataset.py
  • demo.py
  • pyproject.toml

Comment on lines +181 to +183
# Project
/data
/demo_* No newline at end of file
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

/demo_* conflicts with the documented demo assets.

README.md tells users to run the demo from ./demo_data/..., but this pattern ignores that directory. That prevents the sample inputs/checkpoint from being added or updated in git, which undercuts the demo flow this PR introduces.

Suggested fix
 # Project
 /data
 /demo_*
+!/demo_data/
+!/demo_data/**
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.gitignore around lines 181 - 183, The .gitignore entry '/demo_*'
incorrectly excludes the demo_data directory referenced in README.md; update the
.gitignore by removing or narrowing the '/demo_*' pattern and add an explicit
allow rule for demo_data (e.g., add '!/demo_data/' and '!/demo_data/**') so the
'./demo_data/...' demo assets can be committed; target the '/demo_*' line in
.gitignore and the README.md mention of './demo_data' to verify consistency.

Comment on lines +19 to +23
- id: ruff
entry: pdm lint --force-exclude
# Run the formatter.
- id: ruff-format
entry: pdm format --force-exclude
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

curl -fsSL https://raw.githubusercontent.com/astral-sh/ruff-pre-commit/v0.7.3/.pre-commit-hooks.yaml \
  | sed -n '/^- id: ruff/,+12p;/^- id: ruff-format/,+8p'

Repository: RationAI/stain-normalization

Length of output: 653


Keep the Ruff hooks on Ruff's default entrypoint.

Replacing the hook entry with pdm lint and pdm format makes the hook depend on pdm being available inside pre-commit's isolated hook environment, where it typically isn't available. This causes hook execution to fail before Ruff even runs. Use the default Ruff entrypoints and pass tool-specific flags via args instead.

Suggested fix
   - repo: https://github.com/astral-sh/ruff-pre-commit
     rev: v0.7.3
     hooks:
       # Run the linter.
       - id: ruff
-        entry: pdm lint --force-exclude
+        args: [--fix, --force-exclude]
       # Run the formatter.
       - id: ruff-format
-        entry: pdm format --force-exclude
+        args: [--force-exclude]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- id: ruff
entry: pdm lint --force-exclude
# Run the formatter.
- id: ruff-format
entry: pdm format --force-exclude
- id: ruff
args: [--fix, --force-exclude]
# Run the formatter.
- id: ruff-format
args: [--force-exclude]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.pre-commit-config.yaml around lines 19 - 23, The pre-commit hooks replaced
Ruff's default entrypoints with `pdm lint` and `pdm format`, which breaks hooks
because `pdm` is not available in pre-commit's isolated environment; change the
`id: ruff` and `id: ruff-format` hook entries back to Ruff's default executables
(`ruff` and `ruff-format`) and move `--force-exclude` into the hook `args` array
so the flags are passed to Ruff without depending on `pdm` in the hook
environment.

Comment on lines +16 to +28
import argparse
from collections.abc import Generator
from pathlib import Path
from typing import Any

import numpy as np
import pandas as pd
from PIL import Image
from rationai.mlkit.data.datasets import MetaTiledSlides, OpenSlideTilesDataset
from tqdm import tqdm

from stain_normalization.analysis import StainAnalyzer
from stain_normalization.analysis.report import REPORT_METRICS
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Sort this import block so lint passes.

The pipeline is already failing with Ruff I001 on this block, so this script cannot merge cleanly until the imports are re-ordered/formatted.

🧰 Tools
🪛 GitHub Actions: Python Lint (RationAI Standard)

[error] 16-16: I001 Import block is unsorted or un-formatted. Organize imports (ruff).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@analyze_dataset.py` around lines 16 - 28, Reorder the imports in
analyze_dataset.py into standard-library, third-party, and local/project groups
(each group alphabetized) and ensure single blank lines between groups so
Ruff/I001 passes; specifically place argparse, collections.abc.Generator,
pathlib.Path, and typing.Any first, then third-party imports numpy, pandas,
PIL.Image (from PIL import Image), tqdm, and
rationai.mlkit.data.datasets.MetaTiledSlides/OpenSlideTilesDataset, and finally
local package imports stain_normalization.analysis.StainAnalyzer and
stain_normalization.analysis.report.REPORT_METRICS; run isort/ruff to verify
formatting and remove any unused imports if flagged.

Comment on lines +90 to +102
if args.max_tiles and len(orig_tiles) > args.max_tiles:
orig_tiles = orig_tiles.sample(n=args.max_tiles, random_state=42)
comp_tiles = comp_tiles.loc[orig_tiles.index]
print(f"Subsampled to {args.max_tiles} tile pairs")

analyzer = StainAnalyzer()
orig_iter = iterate_tiles(orig_slides, orig_tiles)
comp_iter = iterate_tiles(comp_slides, comp_tiles)

for (_, orig_tile, image_id), (_, comp_tile, _) in tqdm(
zip(orig_iter, comp_iter, strict=False), total=len(orig_tiles)
):
analyzer.compare(comp_tile, image_id=image_id, reference=orig_tile)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't silently pair tiles by iteration order.

comp_tiles = comp_tiles.loc[orig_tiles.index] and zip(..., strict=False) assume both datasets have identical indexes and ordering. Any reordering or missing row silently truncates or misaligns pairs, which corrupts the reported metrics.

Minimal guard
-    for (_, orig_tile, image_id), (_, comp_tile, _) in tqdm(
-        zip(orig_iter, comp_iter, strict=False), total=len(orig_tiles)
-    ):
+    for (orig_slide, orig_tile, image_id), (
+        comp_slide,
+        comp_tile,
+        comp_image_id,
+    ) in tqdm(zip(orig_iter, comp_iter, strict=True), total=len(orig_tiles)):
+        if (orig_slide, image_id) != (comp_slide, comp_image_id):
+            raise ValueError("Original and compared datasets are not aligned")
         analyzer.compare(comp_tile, image_id=image_id, reference=orig_tile)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@analyze_dataset.py` around lines 90 - 102, The code silently pairs tiles by
iteration order which can misalign data; instead ensure explicit matching by a
unique key and validate equality before comparing: reindex comp_tiles to
orig_tiles using comp_tiles = comp_tiles.reindex(orig_tiles.index) (or perform a
merge/join on the tile identifier column) and check that no NaNs were introduced
and that len(comp_tiles) == len(orig_tiles) (raise an error or exit if there are
missing/mismatched rows); then create iterators via iterate_tiles(orig_slides,
orig_tiles) and iterate_tiles(comp_slides, comp_tiles) and remove zip(...,
strict=False) so you iterate only aligned rows when calling
analyzer.compare(comp_tile, image_id=image_id, reference=orig_tile).

Comment on lines +1 to +11
import argparse
from pathlib import Path

import albumentations as A
import numpy as np
import torch
from albumentations.pytorch import ToTensorV2
from PIL import Image

from stain_normalization.stain_normalization_model import StainNormalizationModel

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix import sorting to pass linting.

The pipeline is failing with ruff error I001 indicating imports are unsorted. Run ruff check --fix or ruff format to auto-fix.

Suggested fix
 import argparse
 from pathlib import Path
 
 import albumentations as A
-import numpy as np
-import torch
 from albumentations.pytorch import ToTensorV2
+import numpy as np
 from PIL import Image
+import torch
 
 from stain_normalization.stain_normalization_model import StainNormalizationModel

Note: The exact ordering depends on your ruff configuration. Running ruff check --fix demo.py will apply the correct order automatically.

🧰 Tools
🪛 GitHub Actions: Python Lint (RationAI Standard)

[error] 1-1: I001 Import block is unsorted or un-formatted. Organize imports (ruff).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@demo.py` around lines 1 - 11, The imports in demo.py are unsorted (ruff
I001); reorder them into the configured import groups and alphabetic order
(standard library, third-party, local) so ruff lint passes — i.e., ensure
argparse and pathlib.Path come first, then third-party imports like
albumentations (A), albumentations.pytorch.ToTensorV2, numpy (np), PIL.Image,
torch, and finally the local StainNormalizationModel import; you can run `ruff
check --fix demo.py` or `ruff format demo.py` to apply the correct ordering
automatically.

dependencies = [
"lightning>=2.0.0",
"albumentations>=1.4.14",
"torchmetrics>=1.4.14",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

python - <<'PY'
import json
import urllib.request

with urllib.request.urlopen("https://pypi.org/pypi/torchmetrics/json") as resp:
    data = json.load(resp)

releases = sorted(data["releases"])
print("1.4.14 exists:", "1.4.14" in data["releases"])
print("Recent 1.4.x releases:", [v for v in releases if v.startswith("1.4")][-10:])
PY

Repository: RationAI/stain-normalization

Length of output: 2653


🌐 Web query:

torchmetrics 1.4.14 PyPI release version

💡 Result:

torchmetrics==1.4.14 is not a version published on PyPI. The 1.4.x releases on PyPI are 1.4.0, 1.4.0.post0, 1.4.1, 1.4.2, and 1.4.3. The current latest release shown on PyPI is 1.9.0 (Mar 9, 2026). [1]

Source(s):

  • [1] PyPI “torchmetrics” release history (shows available versions and dates)

Fix torchmetrics version constraint—1.4.14 does not exist on PyPI.

The constraint torchmetrics>=1.4.14 is unsatisfiable. PyPI only has torchmetrics 1.4.0, 1.4.0.post0, 1.4.1, 1.4.2, and 1.4.3 in the 1.4.x series. This will cause pdm install to fail during dependency resolution. Change this to a valid version constraint, such as torchmetrics>=1.4.3 or update to a more recent release.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pyproject.toml` at line 11, The pyproject.toml dependency line for
"torchmetrics" uses an invalid constraint "torchmetrics>=1.4.14"; update the
constraint to a valid PyPI version (e.g., "torchmetrics>=1.4.3" or a newer
official release) so dependency resolution succeeds—edit the "torchmetrics"
entry in pyproject.toml to replace ">=1.4.14" with a valid version specifier.

Comment on lines +30 to +33
train = "python -m stain_normalization mode=fit"
validate = "python -m stain_normalization mode=validate"
test = "python -m stain_normalization mode=test"
predict = "python -m stain_normalization mode=predict"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

These scripts point at a module the checkout doesn't contain.

train / validate / test / predict all call python -m stain_normalization, but this PR doesn't add that package. In this state those entrypoints fail before any app code runs.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pyproject.toml` around lines 30 - 33, The package entrypoints 'train',
'validate', 'test', and 'predict' currently run "python -m stain_normalization"
but this repo doesn't contain a stain_normalization module; either add that
package/module or update those scripts in pyproject.toml to point to the real
CLI module that exists in this checkout (replace "stain_normalization" with the
correct module name), ensuring the module exposes the expected mode argument
handlers used by the project.

Comment on lines +31 to +34
## Dostupné arguemnty:
- **input**: cesta k obrázku alebo priečinku s obrázkami na normalizáciu (default ./demo_data/modified)
- **output**: priečinok, kam sa uložia normalizované obrázky (default ./demo_data)
- **use_cpu**: defaultne nadstavené na použitie GPU ak je dostupná, avšak ak by nastali problémy odporúčam použivať iba CPU No newline at end of file
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix the CLI section typos.

arguemnty and nadstavené are user-facing typos in the only section that explains the flags, so the README reads unfinished right where users are expected to follow it.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` around lines 31 - 34, Fix the typos in the CLI flags section
header and description: change the header "Dostupné arguemnty:" to "Dostupné
argumenty:" and correct "nadstavené" to "nastavené" in the line describing
"use_cpu"; keep the existing flag names (input, output, use_cpu) and their
default values intact so only spelling is updated.

@LAdam-ix LAdam-ix closed this Mar 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant