Conversation
There was a problem hiding this comment.
Pull request overview
Adds a preprocessing tiling stage (plus QC masks generation) to the ulcerative-colitis pipeline, wiring it into the existing Hydra/MLflow/Ray-based preprocessing structure and updating project dependencies accordingly.
Changes:
- Introduces
preprocessing/tiling.pyto build slide/tile datasets using tissue + QC masks and save results to MLflow. - Adds
preprocessing/quality_control.pyto generate QC masks/metrics viarationai.AsyncClient, organize outputs, and log artifacts to MLflow. - Extends configs (preprocessing + experiments + datasets-with-masks) and updates dependencies (
pyproject.toml,uv.lock) for tiling/QC tooling.
Reviewed changes
Copilot reviewed 23 out of 25 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| uv.lock | Locks new dependencies required by tiling/QC (incl. rationai-tiling, rationai-sdk, ratiopath, and geo/image stack). |
| pyproject.toml | Adds tiling/QC-related runtime deps and Git sources for rationai-tiling + rationai-sdk. |
| preprocessing/tissue_masks.py | Small variable rename (df → dataset) for clarity. |
| preprocessing/tiling.py | New tiling pipeline: dataset split, slide reading via Ray, mask overlap computation, MLflow dataset export. |
| preprocessing/quality_control.py | New QC pipeline: async slide checks, mask organization, metrics aggregation, MLflow artifact logging. |
| scripts/preprocessing/tiling.py | New kube-jobs submission script for tiling. |
| scripts/preprocessing/quality_control.py | New kube-jobs submission script for QC. |
| configs/preprocessing/tiling.yaml | Adds global tiling config schema (mpp/tile_extent/stride/splits/metadata). |
| configs/preprocessing/quality_control.yaml | Adds global QC config schema (output_dir, timeouts, qc_parameters, metadata). |
| configs/experiment/preprocessing/tiling/ftn_*.yaml | Adds FTN tiling experiment presets for multiple mpp/tile sizes. |
| configs/experiment/preprocessing/tiling/ikem_*.yaml | Adds IKEM tiling experiment presets for multiple mpp/tile sizes. |
| configs/experiment/preprocessing/tiling/knl_patos_*.yaml | Adds KNL Patos tiling experiment presets for multiple mpp/tile sizes. |
| configs/dataset/processed_w_masks/ftn.yaml | Adds dataset variant with tissue/QC mask MLflow URIs (currently marked TODO). |
| configs/dataset/processed_w_masks/ikem.yaml | Adds dataset variant with tissue/QC mask MLflow URIs (currently marked TODO). |
| configs/dataset/processed_w_masks/knl_patos.yaml | Adds dataset variant with tissue/QC mask MLflow URIs (currently marked TODO). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
pyproject.toml (1)
19-19: Consider pinning the Git source forrationai-tilingexplicitly inpyproject.toml.While
uv.lockalready locks this dependency to commit475e7938e9a7230b378fca1c67106eda84f69c85, adding arevto the source declaration makes the intent explicit and reduces reliance on the lockfile alone.Proposed change
-rationai-tiling = { git = "https://gitlab.ics.muni.cz/rationai/digital-pathology/libraries/tiling.git" } +rationai-tiling = { git = "https://gitlab.ics.muni.cz/rationai/digital-pathology/libraries/tiling.git", rev = "475e7938e9a7230b378fca1c67106eda84f69c85" }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pyproject.toml` at line 19, The pyproject.toml currently references the Git dependency "rationai-tiling" without an explicit rev; update its package source entry to include the exact commit SHA (e.g., 475e7938e9a7230b378fca1c67106eda84f69c85) via a rev field so the source declaration is pinned (locate the rationai-tiling dependency block in pyproject.toml and add rev = "<commit-sha>" to that source entry).scripts/preprocessing/tiling.py (1)
4-17: Clarify or parameterize placeholder values.The script contains Python Ellipsis (
...) placeholders on lines 5, 6, and 15. While syntactically valid, these will cause runtime errors whensubmit_jobexpects actual string values. Consider either:
- Using descriptive placeholder strings (e.g.,
"<your-username>")- Reading values from environment variables or config
- Adding a comment indicating this is a template file
♻️ Suggested improvement using environment variables
+import os from kube_jobs import storage, submit_job submit_job( - job_name="ulcerative-colitis-tiling-...", - username=..., + job_name=os.environ.get("JOB_NAME", "ulcerative-colitis-tiling"), + username=os.environ["KUBE_USERNAME"], public=False, cpu=64, memory="128Gi", shm="48Gi", script=[ "git clone https://github.com/RationAI/ulcerative-colitis.git workdir", "cd workdir", "uv sync --frozen", - "uv run --active -m preprocessing.tiling +dataset=processed_w_masks/...", + f"uv run --active -m preprocessing.tiling +dataset=processed_w_masks/{os.environ['DATASET']}", ], storage=[storage.secure.DATA], )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/preprocessing/tiling.py` around lines 4 - 17, The submit_job call contains Python Ellipsis (...) in the job_name, username and dataset script argument which will raise runtime errors; update the submit_job invocation (referencing submit_job, job_name, username and the script list entry with "uv run ... +dataset=...") to use real values or parameterize them: replace ... with descriptive placeholder strings like "<your-job-name>" and "<your-username>" or read from environment/config (e.g., os.environ["JOB_NAME"], os.environ["USERNAME"], os.environ["DATASET"]) and document the required env vars at the top of the file so the script is runnable as a template.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@preprocessing/tiling.py`:
- Around line 102-110: The function extract_coverages is missing a type
annotation for the varargs parameter *cols; update its signature to annotate the
varargs as strings (e.g., *cols: str or *cols: tuple[str, ...]) so the linter no
longer flags no-untyped-def, keeping the existing return type dict[str, Any] and
behavior in extract_coverages.
- Around line 140-205: Remove the unused pyright-style ignore comments (e.g., "#
type: ignore[reportArgumentType]" and "# type: ignore[reportCallIssue]") in this
tiling pipeline: specifically on the map(add_nancy_index, ...), map(qc_agg,
...), slides.map(add_fold, ...), slides.map(add_mask_paths, ...), the
tile_overlay_overlap(...) with_column calls, map(extract_coverages, ...),
filter(filter_tissue, ...), and final map(select, ...) occurrences; either
delete the bracketed ignores or replace them with a generic "# type: ignore" if
your typechecker requires a suppression, ensuring no remaining
"[reportArgumentType]" or "[reportCallIssue]" annotations remain in functions
row_hash, add_nancy_index, qc_agg, add_fold, add_mask_paths,
tile_overlay_overlap, extract_coverages, filter_tissue, and select.
---
Nitpick comments:
In `@pyproject.toml`:
- Line 19: The pyproject.toml currently references the Git dependency
"rationai-tiling" without an explicit rev; update its package source entry to
include the exact commit SHA (e.g., 475e7938e9a7230b378fca1c67106eda84f69c85)
via a rev field so the source declaration is pinned (locate the rationai-tiling
dependency block in pyproject.toml and add rev = "<commit-sha>" to that source
entry).
In `@scripts/preprocessing/tiling.py`:
- Around line 4-17: The submit_job call contains Python Ellipsis (...) in the
job_name, username and dataset script argument which will raise runtime errors;
update the submit_job invocation (referencing submit_job, job_name, username and
the script list entry with "uv run ... +dataset=...") to use real values or
parameterize them: replace ... with descriptive placeholder strings like
"<your-job-name>" and "<your-username>" or read from environment/config (e.g.,
os.environ["JOB_NAME"], os.environ["USERNAME"], os.environ["DATASET"]) and
document the required env vars at the top of the file so the script is runnable
as a template.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f86d163b-66ee-4493-a5d1-4ec51d6cfcc4
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (13)
configs/dataset/processed/ftn.yamlconfigs/dataset/processed/ikem.yamlconfigs/dataset/processed/knl_patos.yamlconfigs/dataset/processed_w_masks/ftn.yamlconfigs/dataset/processed_w_masks/ikem.yamlconfigs/dataset/processed_w_masks/knl_patos.yamlconfigs/preprocessing/tiling.yamlpreprocessing/quality_control.pypreprocessing/split_dataset.pypreprocessing/tiling.pypreprocessing/tissue_masks.pypyproject.tomlscripts/preprocessing/tiling.py
Tiling script.
Closes IBD-19
Blocked by #3 , #4
[IGNORE QC FILES]
Dependency graph:
Summary by CodeRabbit
New Features
Chores