ecs-analysis

Comparing extracellular space (ECS) geometry between two tissue-preparation methods — Chemical fixation vs Rapid High-Pressure Freezing (HPF) — across CellMap groundtruth crops in four mouse tissues (Kidney, Heart, Liver, Cortex). 41 active crops, ~1,500 cells.

Quick handoff: the file handoff.zip at the repo root contains a self-contained snapshot of the code, current results, and figures (excludes archive/ and paper/). Either clone the repo or download just that zip — both produce a runnable copy.

Status (as of handoff)

The pipeline is built and validated. A native-resolution pass on all 41 crops is mostly complete; matched-resolution and degradation experiments are partial. See Status of computational work below for what remains.

Repository layout

ecs/
  config.py           single source of truth: crops, datasets, anatomy,
                      thresholds. Reads ECS_DATA_BASE env var for the
                      zarr root (default: /Volumes/cellmap/data).
  io.py               zarr loading, voxel-size extraction, downsampling.
                      Unions `ecs` and `bm` (basement membrane) into one
                      effective ECS mask for distance-transform analyses.
  geometry.py         distance transforms, 6-conn boundary face counting,
                      marching-cubes meshing, cotangent-Laplacian mean
                      curvature (sign-calibrated on synthetic spheres),
                      multi-scale roughness, local-extremum feature
                      detection.
  metrics/
    volume_fraction.py  ECS%, cell% (metadata fast-path, includes bm)
    ecs_width.py        distance-transform percentiles, narrow-channel
                        (<200nm) filter
    voronoi_gap.py      cell-Voronoi gap distribution + contact fractions
    sa_v.py             crop-level ECS-facing surface / cell volume
    topology.py         pooled curvature, multi-scale roughness,
                        protrusion/indentation density per um^2

scripts/
  run_native.py         native-resolution pass on all crops
  run_matched.py        downsample to TARGET_VOXEL_NM (8nm) then run all
  run_degradation.py    Chemical crops at multiple resolutions
                        (default 2/4/8/16 nm)
  summarize.py          aggregate CSVs into tissue x prep tables and
                        anatomy-matched comparisons
  stats.py              Mann-Whitney U + Cliff's delta + bootstrap CIs
  make_figures.py       per-metric strip plots and anatomy-matched panels
  cluster_submit.sh     LSF (Janelia) per-crop job submission

results/
  native_*.csv          one row per crop per metric (incremental writes)
  matched_*.csv         8nm-downsampled equivalents
  degradation_*.csv     one row per (crop, resolution) for Chemical crops
  summary_*.csv         aggregated tables produced by summarize.py
  stats_native.csv      Mann-Whitney + Cliff's delta + bootstrap CIs

figures/                rendered PNGs from make_figures.py
archive/                old code and CSVs from a prior pipeline (kept
                        for reference; not part of the new flow)

crop_annotations.csv    anatomy labels per crop. Currently covers 31 of
                        41 crops; expert is still annotating the 10
                        cortex crops (1116, 1033-1037, 1045, 1046, 1139,
                        1141). Once added, anatomy-matched comparisons
                        auto-update — no code change needed.

Key design decisions worth knowing

bm (basement membrane) is treated as ECS. Per the imaging team, bm is structurally a sub-part of the extracellular compartment. Only 4 Kidney-Chemical crops have any bm voxels, but for those the correction adds 2.7-10.4 percentage points to ECS%.
Sign convention for curvature is calibrated on synthetic spheres. Convex surfaces -> positive H, concave -> negative. The previous pipeline (in archive/) had this inverted. See ecs.geometry.sphere_sign_check.
Cell inclusion uses physical volume (MIN_CELL_VOL_NM3), not voxel count. Earlier pipelines used a fixed voxel-count threshold, which let many more small fragments through at finer resolutions.
Per-cell metrics are pooled to the crop level for statistics. Each crop contributes one number per metric. Avoids pseudo-replication from cells-within-crop being non-independent.
All crop-level metrics are computed per-crop incrementally. Mid-run failures (SMB drops, OOM, etc.) preserve completed work.

Environment

pip install -r requirements.txt

Tested with Python 3.9. The dependencies are deliberately stable: numpy, scipy, zarr, scikit-image, trimesh, matplotlib.

Running

Local (slow, single-process)

# Set the zarr root
export ECS_DATA_BASE=/Volumes/cellmap/data    # or /nrs/cellmap/data on cluster

# Native-resolution metrics on all crops (writes results/native_*.csv)
python -m scripts.run_native

# Matched-resolution (downsample everything to 8nm)
python -m scripts.run_matched

# Degradation: Chemical crops at 2/4/8/16 nm
python -m scripts.run_degradation

# Summaries and figures from whatever CSVs exist
python -m scripts.summarize --prefix native
python -m scripts.stats
python -m scripts.make_figures --prefix native

To resume an interrupted run, just re-run — the incremental-write logic will skip rows that are already present (per crop). You can also specify a subset of crops with --only crop1026,crop1027,... and a subset of metrics with --metrics ecs_width,voronoi_gap.

Cluster (LSF, Janelia)

# Required: where the zarr data lives
export ECS_DATA_BASE=/nrs/cellmap/data

# Required: a python with the requirements.txt deps installed
export ECS_PYTHON=/path/to/python

# Optional: where to write results. Default is `results/` in the repo,
# which means after the run you can `git add results/ && git commit && git push`
# to send everything back. If you'd rather write to shared lab space:
export ECS_RESULTS_DIR=/nrs/cellmap/people/<you>/ecs-results
export ECS_FIGURES_DIR=/nrs/cellmap/people/<you>/ecs-figures

# Optional: LSF queue + resources
export ECS_QUEUE=local

bash scripts/cluster_submit.sh native        # phase 2 (all metrics)
bash scripts/cluster_submit.sh matched       # phase 4 (downsampled to 8nm)
bash scripts/cluster_submit.sh degradation   # phase 5 (Chemical scan)

Each invocation submits one bsub job per crop. LSF logs land in cluster_logs/<phase>/<crop>.{out,err} in the repo. The per-crop incremental-write logic means each finishing job appends its row to <RESULTS_DIR>/<phase>_<metric>.csv — no separate aggregation step.

Once all jobs are done, run the post-processing locally (these are fast and don't need to go through bsub):

python -m scripts.summarize --prefix native
python -m scripts.summarize --prefix matched
python -m scripts.stats
python -m scripts.make_figures --prefix native
python -m scripts.make_figures --prefix matched

To send results back: either git add results/ && git commit && git push to share via the GitHub repo, or tarball the results directory and send the file directly.

Wall-time hints (per crop):

Topology at 2nm Cortex Chemical: 15-30 min (largest meshes)
Topology at 4nm Chemical (Kidney/Liver): 5-10 min
Topology at 8nm anything: 1-2 min
Fast metrics (no topology): under 1 min

Status of computational work

The results/ and figures/ directories were cleared on handoff so the cluster run is the canonical source. Phases to run on the cluster:

Phase	Command	Approx cluster wall-time at 40 parallel jobs
2 - Native metrics	`bash scripts/cluster_submit.sh native`	30-60 min
4 - Matched at 8nm	`bash scripts/cluster_submit.sh matched`	15-30 min
5 - Degradation (Chemical only)	`bash scripts/cluster_submit.sh degradation`	30-90 min

After cluster jobs finish, run locally:

python -m scripts.summarize --prefix native
python -m scripts.stats
python -m scripts.make_figures --prefix native

Phase 6 (anatomy-matched filtering) auto-activates from crop_annotations.csv. The 10 cortex crops are still pending expert annotation. Once they land, re-run the summary and figure scripts to pick them up — no recompute of the per-crop CSVs is needed.

Headline scientific finding so far

In Liver (n=7 Chemical, n=10 HPF), Chemical fixation shows:

~5x more ECS volume fraction (Cliff's delta = +0.83, p = 0.003)
~3x higher SA:V ratio (Cliff's delta = +0.86, p = 0.002)
~2x larger ECS channel widths
3x higher cell density at the per-crop scale

The Liver Chemical crops are at 4nm and HPF at 8nm. The matched- resolution (8nm) re-run shows SA:V essentially unchanged (ratio ~1.00), ECS width changes <2nm, and Voronoi gap shifts upward by 5-19nm. So the Liver Chem-vs-HPF difference is not explained by the 4nm-vs-8nm resolution gap.

The same direction holds in 3 of 4 anatomy-matched groups (bile canaliculus, DCT base, hepatocyte lateral). The glomerular group flips direction (HPF > Chem) but the HPF n is 1.

Known caveats

Voxel-size confound: every Chemical crop is 2nm or 4nm; almost every HPF crop is 8nm. The matched and degradation runs are designed to address this. See ecs/config.py constants for tunable smoothing/threshold parameters.
bm annotation is inconsistent across preps: 4 Kidney-Chemical crops have bm separately labeled; HPF kidneys have bm=0 even though the basement membrane is physically present. We assume HPF rolled bm into ecs at annotation time. Worth confirming with the annotators.
Cortex Chemical is at 2nm, finest in the dataset, but small crops (often <500nm cells, dominated by cell-fragment statistics).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ecs-analysis

Status (as of handoff)

Repository layout

Key design decisions worth knowing

Environment

Running

Local (slow, single-process)

Cluster (LSF, Janelia)

Status of computational work

Headline scientific finding so far

Known caveats

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

ecs-analysis

Status (as of handoff)

Repository layout

Key design decisions worth knowing

Environment

Running

Local (slow, single-process)

Cluster (LSF, Janelia)

Status of computational work

Headline scientific finding so far

Known caveats