-
Notifications
You must be signed in to change notification settings - Fork 0
feat: tiling #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
feat: tiling #5
Changes from all commits
Commits
Show all changes
79 commits
Select commit
Hold shift + click to select a range
6e1d4b9
chore: replace pdm with uv
Adames4 d7a1ef3
feat: configs
Adames4 34abac4
feat: dataset creation
Adames4 80eca85
fix: configs
Adames4 97335ef
fix: configs
Adames4 727c729
feat: add scripts
Adames4 fa9f581
fix: invalid job name
Adames4 fd978b3
fix: set slide_id as index and ensure nancy is an integer
Adames4 2869916
chore: dependencies
Adames4 725e17d
feat: processed data
Adames4 f7026b3
feat: tissue masks
Adames4 ed49c42
feat: script
Adames4 87113e1
feat: script
Adames4 de98fef
fix: job name
Adames4 929d298
fix: exclude .git folder from ray env
Adames4 81d4df3
fix: exclude .venv folder from ray env
Adames4 3f0321f
fix: typo
Adames4 2ecde13
feat: job script
Adames4 6cae5d0
feat: config
Adames4 616ed48
feat: job script
Adames4 700f916
chore: dependencies
Adames4 65e86db
feat: quality control
Adames4 59965a4
feat: add dataset configuration files for ftn, ikem, and knl_patos
Adames4 234d611
fix: output dir
Adames4 4930434
fix: typo
Adames4 180afe7
chore: Merge branch 'feature/tissue-masks' into feature/tiling
Adames4 456d4ae
chore: add tiling libs
Adames4 e88bfd8
fix: naming
Adames4 b91d36d
feat: confs
Adames4 413db4f
feat: tiling
Adames4 f5b4045
fix: confs
Adames4 2a578d1
fix: typo
Adames4 9262f3e
fix: typo
Adames4 b852be9
fix: typo
Adames4 bd8ca9f
fix: dataset index
Adames4 08d3320
feat: update tiling to latest ratiopath
Adames4 3ab6d5a
fix: use tile overlay overlap as udfexpr
Adames4 06e957e
chore: ratiopath from github
Adames4 de0fd75
fix: WIP
Adames4 e56216b
fix: None in overlap
Adames4 95cc81c
feat: confs
Adames4 754698f
chore: dependencies
Adames4 178f294
feat: tile = stride
Adames4 e4d2499
fix: typo
Adames4 acfbf19
fix: glob over changing dir
Adames4 659f505
fix: finish the run
Adames4 45ac3b3
fix: rever last commit
Adames4 3c7a5fc
feat: conf
Adames4 774f24a
fix: splits
Adames4 076ba93
feat: tweaking resources
Adames4 2788656
feat: confs
Adames4 25bec6e
chore: dependecies
Adames4 5faa224
fix: add paths
Adames4 eb96f56
fix: conf
Adames4 89a4582
fix: typo
Adames4 18c556d
chore: dependencies
Adames4 35cfb5c
fix: group splitting
Adames4 bd0c6ac
fix: PR comments
Adames4 9ffea96
chore: dependencies
Adames4 0db7d08
chore: Merge branch 'feature/dataset' into feature/tissue-masks
Adames4 b833059
feat: refactor configs
Adames4 3300341
fix: PR comments
Adames4 c98608e
chore: Merge branch 'feature/dataset' into feature/quality-control
Adames4 bc71668
feat: configs
Adames4 6647d0d
chore: Merge branch 'feature/tissue-masks' into feature/tiling
Adames4 2dfb32b
chore: Merge branch 'feature/quality-control' into feature/tiling
Adames4 30d4494
feat: configs
Adames4 88eb5dc
fix: conf
Adames4 4269a3a
fix: PR
Adames4 a9615ed
fix: repo
Adames4 3b7f95c
chore: Merge branch 'feature/quality-control' into feature/tiling
Adames4 659910e
fix: repo
Adames4 cd09f80
chore: Merge branch 'master' into feature/tiling
Adames4 2007e83
feat: update tiling
Adames4 6c5ec71
fix: typo
Adames4 4c01ac2
fix: typo
Adames4 31e78c2
fix: imports
Adames4 caaa6aa
fix: add ray to with
Adames4 486ca49
fix: mypy
Adames4 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,11 @@ | ||
| defaults: | ||
| - /dataset/processed/ftn@_here_ | ||
| - _self_ | ||
|
|
||
| mlflow_uris: | ||
| tissue_mask: mlflow-artifacts:/86/17149d1de7014112aba3a252a76d10bc/artifacts/tissue_masks | ||
| qc_mask: mlflow-artifacts:/86/d2301fc279c94682a639583731c2fded/artifacts | ||
| splits: | ||
| train: mlflow-artifacts:/86/a0fa337bf26146dab42062237285737f/artifacts/train.csv | ||
| test_preliminary: mlflow-artifacts:/86/1c8e7a9b0c5f4d1b8a3e6c9e2c8f0a9/artifacts/test_preliminary.csv | ||
| test_final: mlflow-artifacts:/86/a0fa337bf26146dab42062237285737f/artifacts/test_final.csv |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,11 @@ | ||
| defaults: | ||
| - /dataset/processed/ikem@_here_ | ||
| - _self_ | ||
|
|
||
| mlflow_uris: | ||
| tissue_mask: mlflow-artifacts:/86/94481ac59246471fb874bfb4dccb5e67/artifacts/tissue_masks | ||
| qc_mask: mlflow-artifacts:/86/65e794b652ab4369aad2e7dbe60eddca/artifacts | ||
| splits: | ||
| train: mlflow-artifacts:/86/9a2d8f975bc24dceb5271c5699560a8f/artifacts/train.csv | ||
| test_preliminary: mlflow-artifacts:/86/9a2d8f975bc24dceb5271c5699560a8f/artifacts/test_preliminary.csv | ||
| test_final: mlflow-artifacts:/86/9a2d8f975bc24dceb5271c5699560a8f/artifacts/test_final.csv |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| defaults: | ||
| - /dataset/processed/knl_patos@_here_ | ||
| - _self_ | ||
|
|
||
| mlflow_uris: | ||
| tissue_mask: mlflow-artifacts:/86/6cdfa5ce4f7242a9b2bb394bfd5ed705/artifacts/tissue_masks | ||
| qc_mask: mlflow-artifacts:/86/7cc5586efbca4ecd8f7ac2847b4ee199/artifacts | ||
| splits: | ||
| test_preliminary: mlflow-artifacts:/86/4d517fd564c741ec8e14679a340ebcb0/artifacts/test_preliminary.csv | ||
| test_final: mlflow-artifacts:/86/4d517fd564c741ec8e14679a340ebcb0/artifacts/test_final.csv |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,15 @@ | ||
| # @package _global_ | ||
|
|
||
| mpp: 1.55 # level 2 | ||
| tile_extent: 224 | ||
| stride: 112 | ||
| tissue_threshold: 0.5 | ||
|
|
||
| metadata: | ||
| run_name: "🧱 Tiling: ${dataset.institution}" | ||
| description: Tile extraction for ${dataset.institution} institution with tile extent ${tile_extent} | ||
| hyperparams: | ||
| mpp: ${mpp} | ||
| tile_extent: ${tile_extent} | ||
| stride: ${stride} | ||
| tissue_threshold: ${tissue_threshold} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,242 @@ | ||
| from pathlib import Path | ||
| from typing import Any, TypedDict, cast | ||
|
|
||
| import hydra | ||
| import mlflow.artifacts | ||
| import pandas as pd | ||
| import ray | ||
| from omegaconf import DictConfig | ||
| from rationai.mlkit import autolog, with_cli_args | ||
| from rationai.mlkit.lightning.loggers import MLFlowLogger | ||
| from rationai.tiling.writers import save_mlflow_dataset | ||
| from ratiopath.ray import read_slides | ||
| from ratiopath.tiling import grid_tiles, tile_overlay_overlap | ||
| from ratiopath.tiling.utils import row_hash | ||
| from ray.data.expressions import col | ||
| from shapely import Polygon | ||
| from shapely.geometry import box | ||
|
|
||
|
|
||
| QC_BLUR_MEAN_COLUMN = "mean_coverage(Piqe)" | ||
| QC_ARTIFACTS_MEAN_COLUMN = "mean_coverage(ResidualArtifactsAndCoverage)" | ||
| QC_SUBFOLDERS = {"blur": "blur_per_pixel", "artifacts": "artifacts_per_pixel"} | ||
|
|
||
|
|
||
| class _RayCpuResources(TypedDict): | ||
| num_cpus: float | ||
|
|
||
|
|
||
| class _RayMemResources(TypedDict): | ||
| memory: int | ||
|
|
||
|
|
||
| LO_CPU: _RayCpuResources = {"num_cpus": 0.1} | ||
| HI_CPU: _RayCpuResources = {"num_cpus": 0.2} | ||
| LO_MEM: _RayMemResources = {"memory": 128 * 1024**2} | ||
| HI_MEM: _RayMemResources = {"memory": 256 * 1024**2} | ||
|
|
||
|
|
||
| def add_nancy_index(row: dict[str, Any], df: pd.DataFrame) -> dict[str, Any]: | ||
| row["nancy_index"] = df.loc[Path(row["path"]).stem, "nancy"] | ||
| return row | ||
|
|
||
|
|
||
| def qc_agg(row: dict[str, Any], df: pd.DataFrame) -> dict[str, Any]: | ||
| qc_df = cast("pd.Series", df.loc[Path(row["path"]).stem]) | ||
|
|
||
| row["blur_mean"] = qc_df[QC_BLUR_MEAN_COLUMN] | ||
| row["artifacts_mean"] = qc_df[QC_ARTIFACTS_MEAN_COLUMN] | ||
|
|
||
| return row | ||
|
|
||
|
|
||
| def add_fold(row: dict[str, Any], df: pd.DataFrame) -> dict[str, Any]: | ||
| row["fold"] = df.loc[Path(row["path"]).stem, "fold"] | ||
| return row | ||
|
|
||
|
|
||
| def add_mask_paths( | ||
| row: dict[str, Any], qc_folder: Path, tissue_folder: Path | ||
| ) -> dict[str, Any]: | ||
| stem = Path(row["path"]).stem | ||
| row["tissue_mask_path"] = str(tissue_folder / f"{stem}.tiff") | ||
| for key, subfolder in QC_SUBFOLDERS.items(): | ||
| row[f"{key}_mask_path"] = str(qc_folder / subfolder / f"{stem}.tiff") | ||
| return row | ||
|
|
||
|
|
||
| def create_tissue_roi(tile_extent: int) -> Polygon: | ||
| offset = tile_extent // 4 | ||
| size = tile_extent // 2 | ||
| return box(offset, offset, offset + size, offset + size) | ||
|
|
||
|
|
||
| def create_qc_roi(tile_extent: int) -> Polygon: | ||
| return box(0, 0, tile_extent, tile_extent) | ||
|
|
||
|
|
||
| def tile(row: dict[str, Any]) -> list[dict[str, Any]]: | ||
| return [ | ||
| { | ||
| "tile_x": x, | ||
| "tile_y": y, | ||
| "path": row["path"], | ||
| "slide_id": row["id"], | ||
| "level": row["level"], | ||
| "tile_extent_x": row["tile_extent_x"], | ||
| "tile_extent_y": row["tile_extent_y"], | ||
| "mpp_x": row["mpp_x"], | ||
| "mpp_y": row["mpp_y"], | ||
| "tissue_mask_path": row["tissue_mask_path"], | ||
| "blur_mask_path": row["blur_mask_path"], | ||
| "artifacts_mask_path": row["artifacts_mask_path"], | ||
| } | ||
| for x, y in grid_tiles( | ||
| slide_extent=(row["extent_x"], row["extent_y"]), | ||
| tile_extent=(row["tile_extent_x"], row["tile_extent_y"]), | ||
| stride=(row["stride_x"], row["stride_y"]), | ||
| ) | ||
| ] | ||
|
|
||
|
|
||
| def extract_coverages(row: dict[str, Any], *cols: str) -> dict[str, Any]: | ||
| for c in cols: | ||
| overlap = row[f"{c}_overlap"] | ||
| zero_overlap = overlap.get("0", 0) | ||
|
Adames4 marked this conversation as resolved.
|
||
| if zero_overlap is None: | ||
| row[c] = 1.0 | ||
| else: | ||
| row[c] = 1.0 - zero_overlap | ||
| return row | ||
|
|
||
|
|
||
| def filter_tissue(row: dict[str, Any], threshold: float) -> bool: | ||
| return row["tissue"] >= threshold | ||
|
|
||
|
|
||
| def select(row: dict[str, Any]) -> dict[str, Any]: | ||
| return { | ||
| "slide_id": row["slide_id"], | ||
| "x": row["tile_x"], | ||
| "y": row["tile_y"], | ||
| "tissue": row["tissue"], | ||
| "blur": row["blur"], | ||
| "artifacts": row["artifacts"], | ||
| } | ||
|
|
||
|
|
||
| def tiling( | ||
| df: pd.DataFrame, | ||
| qc_folder: Path, | ||
| tissue_folder: Path, | ||
| tile_extent: int, | ||
| stride: int, | ||
| mpp: float, | ||
| tissue_threshold: float, | ||
| ) -> tuple[pd.DataFrame, pd.DataFrame]: | ||
| qc_df = pd.read_csv(qc_folder / "qc_metrics.csv", index_col="slide_name") | ||
| paths = df["path"].tolist() | ||
|
|
||
| slides = ( | ||
| read_slides(paths, tile_extent=tile_extent, stride=stride, mpp=mpp) | ||
| .map(row_hash, **LO_CPU, **LO_MEM) | ||
| .map(add_nancy_index, fn_args=(df,), **LO_CPU, **LO_MEM) # pyright: ignore[reportArgumentType] | ||
| .map(qc_agg, fn_args=(qc_df,), **HI_CPU, **LO_MEM) # pyright: ignore[reportArgumentType] | ||
| ) | ||
|
|
||
| if "fold" in df.columns: | ||
| slides = slides.map(add_fold, fn_args=(df,), **LO_CPU, **LO_MEM) # pyright: ignore[reportArgumentType] | ||
|
|
||
| tissue_roi = create_tissue_roi(tile_extent) | ||
| qc_roi = create_qc_roi(tile_extent) | ||
|
|
||
| tiles = ( | ||
| slides.map( | ||
| add_mask_paths, # pyright: ignore[reportArgumentType] | ||
| fn_args=(qc_folder, tissue_folder), | ||
| **LO_CPU, | ||
| **LO_MEM, | ||
| ) | ||
| .flat_map(tile, **HI_CPU, **LO_MEM) | ||
| .repartition(target_num_rows_per_block=4096) | ||
| .with_column( | ||
| "tissue_overlap", | ||
| tile_overlay_overlap( | ||
| tissue_roi, | ||
| col("tissue_mask_path"), | ||
| col("tile_x"), | ||
| col("tile_y"), | ||
| col("mpp_x"), | ||
| col("mpp_y"), | ||
| ), # pyright: ignore[reportCallIssue] | ||
| **HI_CPU, | ||
| **HI_MEM, | ||
| ) | ||
| .map(extract_coverages, fn_args=("tissue",), **LO_CPU, **LO_MEM) # pyright: ignore[reportArgumentType] | ||
| .filter(filter_tissue, fn_args=(tissue_threshold,), **LO_CPU, **LO_MEM) # pyright: ignore[reportArgumentType] | ||
| .with_column( | ||
| "blur_overlap", | ||
| tile_overlay_overlap( | ||
| qc_roi, | ||
| col("blur_mask_path"), | ||
| col("tile_x"), | ||
| col("tile_y"), | ||
| col("mpp_x"), | ||
| col("mpp_y"), | ||
| ), # pyright: ignore[reportCallIssue] | ||
| **HI_CPU, | ||
| **HI_MEM, | ||
| ) | ||
| .with_column( | ||
| "artifacts_overlap", | ||
| tile_overlay_overlap( | ||
| qc_roi, | ||
| col("artifacts_mask_path"), | ||
| col("tile_x"), | ||
| col("tile_y"), | ||
| col("mpp_x"), | ||
| col("mpp_y"), | ||
| ), # pyright: ignore[reportCallIssue] | ||
| **HI_CPU, | ||
| **HI_MEM, | ||
| ) | ||
| .map(extract_coverages, fn_args=("blur", "artifacts"), **LO_CPU, **LO_MEM) # pyright: ignore[reportArgumentType] | ||
| .map(select, **LO_CPU, **LO_MEM) | ||
| ) | ||
|
Adames4 marked this conversation as resolved.
|
||
|
|
||
| return slides.to_pandas(), tiles.to_pandas() | ||
|
|
||
|
|
||
| @with_cli_args(["+preprocessing=tiling"]) | ||
| @hydra.main(config_path="../configs", config_name="preprocessing", version_base=None) | ||
| @autolog | ||
| def main(config: DictConfig, logger: MLFlowLogger) -> None: | ||
| qc_folder = Path( | ||
| mlflow.artifacts.download_artifacts(config.dataset.mlflow_uris.qc_mask) | ||
| ) | ||
| tissue_folder = Path( | ||
| mlflow.artifacts.download_artifacts(config.dataset.mlflow_uris.tissue_mask) | ||
| ) | ||
|
|
||
| for name, split_uri in config.dataset.mlflow_uris.splits.items(): | ||
| split = pd.read_csv( | ||
| mlflow.artifacts.download_artifacts(split_uri), index_col="slide_id" | ||
| ) | ||
|
|
||
| df_slides, df_tiles = tiling( | ||
| split, | ||
| qc_folder=qc_folder, | ||
| tissue_folder=tissue_folder, | ||
| tile_extent=config.tile_extent, | ||
| stride=config.stride, | ||
| mpp=config.mpp, | ||
| tissue_threshold=config.tissue_threshold, | ||
| ) | ||
| save_mlflow_dataset( | ||
| df_slides, df_tiles, f"{name} - {config.dataset.institution}" | ||
| ) | ||
|
Adames4 marked this conversation as resolved.
|
||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| with ray.init(runtime_env={"excludes": [".git", ".venv"]}): | ||
| main() | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.