Skip to content

feat: tiling#5

Merged
Adames4 merged 79 commits intomasterfrom
feature/tiling
Mar 13, 2026
Merged

feat: tiling#5
Adames4 merged 79 commits intomasterfrom
feature/tiling

Conversation

@Adames4
Copy link
Member

@Adames4 Adames4 commented Feb 12, 2026

Tiling script.

Closes IBD-19

Blocked by #3 , #4

[IGNORE QC FILES]

Dependency graph:

                         +--------------+
                  -------| tissue-masks |<------+           +------------+      +----------------------+
                 /       +--------------+       |       +---| tile-masks |<-----| preprocessing-report |
                /                               |       |   +------------+      +----------------------+
+---------+    /                            +--------+  |
| dataset | <-+                             | tiling |<-+
+---------+    \                            +--------+  |
                \                               |       |   +------------+
                 \       +-----------------+    |       +---| embeddings |
                  -------| quality-control |<---+           +------------+
                         +-----------------+

Summary by CodeRabbit

  • New Features

    • Added a configurable tiling preprocessing pipeline for slide image extraction, QC-aware filtering, mask integration, and dataset split handling.
    • Added new preprocessing configuration and mask/dataset URI mappings for multi-split workflows.
  • Chores

    • Updated dataset artifact configuration structure to a nested MLflow URI mapping.
    • Added a runtime dependency and a job submission script to run the tiling workflow.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a preprocessing tiling stage (plus QC masks generation) to the ulcerative-colitis pipeline, wiring it into the existing Hydra/MLflow/Ray-based preprocessing structure and updating project dependencies accordingly.

Changes:

  • Introduces preprocessing/tiling.py to build slide/tile datasets using tissue + QC masks and save results to MLflow.
  • Adds preprocessing/quality_control.py to generate QC masks/metrics via rationai.AsyncClient, organize outputs, and log artifacts to MLflow.
  • Extends configs (preprocessing + experiments + datasets-with-masks) and updates dependencies (pyproject.toml, uv.lock) for tiling/QC tooling.

Reviewed changes

Copilot reviewed 23 out of 25 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
uv.lock Locks new dependencies required by tiling/QC (incl. rationai-tiling, rationai-sdk, ratiopath, and geo/image stack).
pyproject.toml Adds tiling/QC-related runtime deps and Git sources for rationai-tiling + rationai-sdk.
preprocessing/tissue_masks.py Small variable rename (dfdataset) for clarity.
preprocessing/tiling.py New tiling pipeline: dataset split, slide reading via Ray, mask overlap computation, MLflow dataset export.
preprocessing/quality_control.py New QC pipeline: async slide checks, mask organization, metrics aggregation, MLflow artifact logging.
scripts/preprocessing/tiling.py New kube-jobs submission script for tiling.
scripts/preprocessing/quality_control.py New kube-jobs submission script for QC.
configs/preprocessing/tiling.yaml Adds global tiling config schema (mpp/tile_extent/stride/splits/metadata).
configs/preprocessing/quality_control.yaml Adds global QC config schema (output_dir, timeouts, qc_parameters, metadata).
configs/experiment/preprocessing/tiling/ftn_*.yaml Adds FTN tiling experiment presets for multiple mpp/tile sizes.
configs/experiment/preprocessing/tiling/ikem_*.yaml Adds IKEM tiling experiment presets for multiple mpp/tile sizes.
configs/experiment/preprocessing/tiling/knl_patos_*.yaml Adds KNL Patos tiling experiment presets for multiple mpp/tile sizes.
configs/dataset/processed_w_masks/ftn.yaml Adds dataset variant with tissue/QC mask MLflow URIs (currently marked TODO).
configs/dataset/processed_w_masks/ikem.yaml Adds dataset variant with tissue/QC mask MLflow URIs (currently marked TODO).
configs/dataset/processed_w_masks/knl_patos.yaml Adds dataset variant with tissue/QC mask MLflow URIs (currently marked TODO).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Adames4 Adames4 requested a review from vejtek February 12, 2026 20:23
This was referenced Feb 12, 2026
@Adames4 Adames4 removed the request for review from vejtek February 28, 2026 20:51
@Adames4 Adames4 changed the base branch from feature/tissue-masks to master March 8, 2026 19:56
@Adames4 Adames4 requested review from a team, JakubPekar, matejpekar and vejtek March 8, 2026 19:56
@Adames4 Adames4 requested a review from matejpekar March 10, 2026 18:28
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
pyproject.toml (1)

19-19: Consider pinning the Git source for rationai-tiling explicitly in pyproject.toml.

While uv.lock already locks this dependency to commit 475e7938e9a7230b378fca1c67106eda84f69c85, adding a rev to the source declaration makes the intent explicit and reduces reliance on the lockfile alone.

Proposed change
-rationai-tiling = { git = "https://gitlab.ics.muni.cz/rationai/digital-pathology/libraries/tiling.git" }
+rationai-tiling = { git = "https://gitlab.ics.muni.cz/rationai/digital-pathology/libraries/tiling.git", rev = "475e7938e9a7230b378fca1c67106eda84f69c85" }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pyproject.toml` at line 19, The pyproject.toml currently references the Git
dependency "rationai-tiling" without an explicit rev; update its package source
entry to include the exact commit SHA (e.g.,
475e7938e9a7230b378fca1c67106eda84f69c85) via a rev field so the source
declaration is pinned (locate the rationai-tiling dependency block in
pyproject.toml and add rev = "<commit-sha>" to that source entry).
scripts/preprocessing/tiling.py (1)

4-17: Clarify or parameterize placeholder values.

The script contains Python Ellipsis (...) placeholders on lines 5, 6, and 15. While syntactically valid, these will cause runtime errors when submit_job expects actual string values. Consider either:

  1. Using descriptive placeholder strings (e.g., "<your-username>")
  2. Reading values from environment variables or config
  3. Adding a comment indicating this is a template file
♻️ Suggested improvement using environment variables
+import os
 from kube_jobs import storage, submit_job
 
 
 submit_job(
-    job_name="ulcerative-colitis-tiling-...",
-    username=...,
+    job_name=os.environ.get("JOB_NAME", "ulcerative-colitis-tiling"),
+    username=os.environ["KUBE_USERNAME"],
     public=False,
     cpu=64,
     memory="128Gi",
     shm="48Gi",
     script=[
         "git clone https://github.com/RationAI/ulcerative-colitis.git workdir",
         "cd workdir",
         "uv sync --frozen",
-        "uv run --active -m preprocessing.tiling +dataset=processed_w_masks/...",
+        f"uv run --active -m preprocessing.tiling +dataset=processed_w_masks/{os.environ['DATASET']}",
     ],
     storage=[storage.secure.DATA],
 )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/preprocessing/tiling.py` around lines 4 - 17, The submit_job call
contains Python Ellipsis (...) in the job_name, username and dataset script
argument which will raise runtime errors; update the submit_job invocation
(referencing submit_job, job_name, username and the script list entry with "uv
run ... +dataset=...") to use real values or parameterize them: replace ... with
descriptive placeholder strings like "<your-job-name>" and "<your-username>" or
read from environment/config (e.g., os.environ["JOB_NAME"],
os.environ["USERNAME"], os.environ["DATASET"]) and document the required env
vars at the top of the file so the script is runnable as a template.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@preprocessing/tiling.py`:
- Around line 102-110: The function extract_coverages is missing a type
annotation for the varargs parameter *cols; update its signature to annotate the
varargs as strings (e.g., *cols: str or *cols: tuple[str, ...]) so the linter no
longer flags no-untyped-def, keeping the existing return type dict[str, Any] and
behavior in extract_coverages.
- Around line 140-205: Remove the unused pyright-style ignore comments (e.g., "#
type: ignore[reportArgumentType]" and "# type: ignore[reportCallIssue]") in this
tiling pipeline: specifically on the map(add_nancy_index, ...), map(qc_agg,
...), slides.map(add_fold, ...), slides.map(add_mask_paths, ...), the
tile_overlay_overlap(...) with_column calls, map(extract_coverages, ...),
filter(filter_tissue, ...), and final map(select, ...) occurrences; either
delete the bracketed ignores or replace them with a generic "# type: ignore" if
your typechecker requires a suppression, ensuring no remaining
"[reportArgumentType]" or "[reportCallIssue]" annotations remain in functions
row_hash, add_nancy_index, qc_agg, add_fold, add_mask_paths,
tile_overlay_overlap, extract_coverages, filter_tissue, and select.

---

Nitpick comments:
In `@pyproject.toml`:
- Line 19: The pyproject.toml currently references the Git dependency
"rationai-tiling" without an explicit rev; update its package source entry to
include the exact commit SHA (e.g., 475e7938e9a7230b378fca1c67106eda84f69c85)
via a rev field so the source declaration is pinned (locate the rationai-tiling
dependency block in pyproject.toml and add rev = "<commit-sha>" to that source
entry).

In `@scripts/preprocessing/tiling.py`:
- Around line 4-17: The submit_job call contains Python Ellipsis (...) in the
job_name, username and dataset script argument which will raise runtime errors;
update the submit_job invocation (referencing submit_job, job_name, username and
the script list entry with "uv run ... +dataset=...") to use real values or
parameterize them: replace ... with descriptive placeholder strings like
"<your-job-name>" and "<your-username>" or read from environment/config (e.g.,
os.environ["JOB_NAME"], os.environ["USERNAME"], os.environ["DATASET"]) and
document the required env vars at the top of the file so the script is runnable
as a template.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f86d163b-66ee-4493-a5d1-4ec51d6cfcc4

📥 Commits

Reviewing files that changed from the base of the PR and between b2d547e and caaa6aa.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (13)
  • configs/dataset/processed/ftn.yaml
  • configs/dataset/processed/ikem.yaml
  • configs/dataset/processed/knl_patos.yaml
  • configs/dataset/processed_w_masks/ftn.yaml
  • configs/dataset/processed_w_masks/ikem.yaml
  • configs/dataset/processed_w_masks/knl_patos.yaml
  • configs/preprocessing/tiling.yaml
  • preprocessing/quality_control.py
  • preprocessing/split_dataset.py
  • preprocessing/tiling.py
  • preprocessing/tissue_masks.py
  • pyproject.toml
  • scripts/preprocessing/tiling.py

@matejpekar matejpekar removed the jules label Mar 10, 2026
@Adames4 Adames4 merged commit 235b919 into master Mar 13, 2026
3 checks passed
@Adames4 Adames4 deleted the feature/tiling branch March 13, 2026 09:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants