Skip to content

DTW DynaCLR Monorepo#398

Open
edyoshikun wants to merge 146 commits intomodular-viscy-stagingfrom
dynadtw
Open

DTW DynaCLR Monorepo#398
edyoshikun wants to merge 146 commits intomodular-viscy-stagingfrom
dynadtw

Conversation

@edyoshikun
Copy link
Copy Markdown
Member

No description provided.

edyoshikun and others added 30 commits March 31, 2026 13:43
Add normalization columns (norm_mean/std/median/iqr/max/min),
z_focus_mean, and TCZYX shape columns to the cell index schema.
preprocess_cell_index reads per-FOV zattrs and writes stats as
parquet columns for fast per-row normalization at training time.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- ExperimentRegistry.from_cell_index: build registry directly from
  preprocessed parquet + zarr metadata (no collection YAML needed)
- datamodule: cell_index_path as primary entry point, _train_final_crop
  changed from BatchedRandSpatialCropd to BatchedCenterSpatialCropd
  (random crop for Z/XY translation is now a user-configured augmentation)
- dataset: read norm stats from parquet columns, build_norm_meta fallback
- index: _align_parquet_columns, _resolve_dims from parquet Y/X_shape

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- DynaCLR-3D-BagOfChannels-v2: z_window=32, yx_patch=256,
  RandSpatialCrop(40,228,228) after affine for Z focus invariance
  + XY translation, CenterCrop(32,160,160) auto-appended.
  batch_size=256, 2 GPUs, 2-day wall time.
- Add dataloader_demo.py: Jupyter-style visualization of raw vs
  augmented anchor/positive batches with per-sample metadata
- Update demo configs and inspection scripts for new pipeline

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
np.nanmin/nanmax fail on scipy sparse arrays. Convert to dense
before computing range stats so the command works on Seurat-exported
anndata zarr stores.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- CLI for running evals
- DAG for evals
- yaml files for evals
… 3 base callbacks

   - model/contrastive_encoder_convnext_tiny.yml: ConvNeXt-Tiny class_paths
   - model/dinov3_frozen_mlp.yml: frozen DINOv3 + MLP projection block
   - augmentations/ops_2d_mild.yml: OPS-specific mild augmentation pipeline
   - data/ops_gene_reporter.yml: OPS data defaults (patch sizes, sampling)
- train_linear_classifier() now returns a third value: raw val outputs
  (y_val, y_val_proba, classes) for downstream ROC curve plotting
- orchestrated run-linear-classifiers generates metrics_summary.pdf
  alongside the CSV: bar chart of AUROC/accuracy/F1 + per-task ROC curves
- Delete evaluate_dataset.py (argparse-based, not in CLI, superseded by
  orchestrator) and its example config
- Strip generate_comparison_report and its helpers from report.py;
  file is now CV-only
- Remove dead _detect_n_features() from cross_validation.py
- Update all callers of train_linear_classifier() to unpack 3-tuple
- Update DAG doc and linear classifiers README

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- FOVRecord.channel_markers: dict[str, str] maps zarr channel name to
  marker for a specific well (populated from Airtable channel_N_marker fields)
- ChannelEntry.wells: list[str] restricts a channel to a subset of wells;
  empty means valid in all wells
- build_collection auto-populates wells by comparing which wells have a
  non-None marker for each channel across all FOVRecords
- _build_experiment_tracks skips channel rows where ch.wells is non-empty
  and the current well is not in that set, preventing noise rows from
  mixed-plate experiments (e.g. viral sensor only in B/3, C/2)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The glob */*/* on zarr v3 stores yields zarr.json files (e.g. A/2/zarr.json)
in addition to position directories. The previous check only stripped names
starting with "." (.zattrs, .zgroup) but missed zarr.json.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ollection

- DynaCLR-2D-MIP-BagOfChannels: add viral_sensor + Phase3D for
  2025_01_28, 2024_10_09, 2024_10_16; fix dragonfly tracks_path
  to point to inner zarr store (tracking.zarr/2024_08_14_...zarr)
- DynaCLR-3D-BagOfChannels-v2: add viral_sensor + Phase3D for
  2025_01_28, 2024_10_09, 2024_10_16
- DynaCLR-3D-BagOfChannels-v3: new collection copied from v2 with
  dragonfly tracks_path fix; v2 left intact for running training job
- DynaCLR-BoC-lc-evaluation-v1: add viral_sensor for all datasets;
  add Phase3D for 2025_01_28

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Wire load_config to delegate to load_composed_config so eval configs
  support base: recipe inheritance (same mechanism as training configs)
- Extract shared eval settings into 4 recipes: predict.yml, reduce.yml,
  plot_infectomics.yml, linear_classifiers_infectomics.yml
- Slim down DynaCLR-2D-BagOfChannels-v3, DynaCLR-2D-MIP-BagOfChannels-v1,
  DINOv3-temporal-MLP-2D-BagOfChannels-v1, and test_evaluation configs
  to use base: references — eliminating copy-pasted 14-experiment
  annotation blocks and shared step configs
- Fix ONNX inference to use GPU (CUDAExecutionProvider) and suppress
  pthread_setaffinity_np noise with intra/inter_op_num_threads=1
- Switch CTC tracking SLURM script to gpu partition

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix \bbf[\b_] -> \bbf(\b|_): inside a character class, \b is a
  backspace character, not a word boundary
- Add \bphc\b to detect phase-contrast (PhC) as label-free

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
pandas 3+ uses Arrow-backed strings by default, which breaks anndata's
zarr writer. Apply the same fix in two code paths:
- embedding_writer.py: replace select_dtypes("string") with per-column
  isinstance checks for pd.StringDtype and Arrow-backed Categoricals
- zarr_utils.py: convert ArrowStringArray columns and index to object
  dtype before calling append_to_anndata_zarr

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- PHATE: default n_jobs from -1 (all cores) to 1 to prevent hogging
  shared SLURM nodes; exposed in PHATEConfig and compute_phate()
- Annotation: support (fov_name, t, track_id) join as fallback when
  both sides lack an 'id' column; normalize fov_name by stripping
  leading/trailing slashes to prevent join mismatches

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
For multiclass problems, compute one-vs-rest AUROC per class and report
as val_{class_name}_auroc columns in the results DataFrame.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- viscy-utils: add onnx, onnxscript to core deps; copairs to eval extras
- dynaclr: add tracking optional group (gurobipy, onnxruntime-gpu,
  py-ctcmetrics, tabulate, tracksdata) for CTC tracking benchmark
- Regenerate uv.lock

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- index.py: replace O(N*tau) Python loop in _compute_valid_anchors with
  vectorized pd.MultiIndex.isin(); add fit=False predict-mode fast path
  that skips anchor computation; add precomputed_valid_anchors to
  clone_with_subset() to avoid redundant recomputation; accept
  cell_index_df to avoid double-reading parquet
- dataset.py: replace per-row loops in _build_match_lookup with
  groupby().indices; skip lookup build in predict mode; add organelle,
  well, microscope to exported metadata columns
- datamodule.py: tune defaults (num_workers=4, cache_pool=500MB,
  pin_memory=True, buffer_size=4); use vectorized MultiIndex.isin for
  FOV split; reuse pre-loaded cell_index_df from ExperimentRegistry
- experiment.py: from_cell_index returns (registry, dataframe) tuple
  so callers can reuse the DataFrame without re-reading from disk

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Use .get() with None default for transcriptome_anndata and skip the
barcode join when it is absent, allowing embeddings on datasets that
lack paired scRNA-seq.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Centralize cell_index_path to shared /hpc/projects/.../collections/
  dir across all training configs
- MIP model: extend z_extraction_window 11->20, z_focus_offset 0.5->0.3,
  yx_patch_size 192->256, add BatchedRandSpatialCropd for Z-invariance
- 3D BoC: num_workers 2->4; SLURM time limit 2d->4d
- Collection: mark DynaCLR-2D-BagOfChannels-v3 as [LEGACY]; fix well
  assignments in BoC-lc-evaluation-v1 (add A/1 for 07_24, remove
  incorrect B/1 and B/2 from 01_28)
- Add new collections: annotated MIP subset, test subset, alfi-eval
  (ALFI mitosis, 3 cell lines), microglia-eval (5 perturbations),
  benchmark_2exp (dataloader profiling)
- predict.yml: add TQDMProgressBar callback (refresh_rate=10)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- evaluate.py: remove all SLURM script generation (_generate_*_sh,
  _slurm_header, _run_local*); replace with prepare_configs() that
  generates YAML configs and prints a JSON manifest to stdout; rename
  CLI command evaluate -> prepare-eval-configs; add MMD config generators
- evaluate_config.py: remove SlurmConfig; add MMDStepConfig and
  ComparisonSpec imports; split PlotStepConfig.color_by into per-exp
  and combined_color_by; update TaskSpec.marker_filters docstring for
  auto-expand behaviour
- cli.py: add prepare-eval-configs, check-evals, append-annotations,
  append-predictions, split-embeddings, compute-mmd, plot-mmd-heatmap,
  evaluate-tracking-accuracy commands
- split_embeddings.py: new CLI to split combined embeddings.zarr by
  experiment, replacing inline SLURM script logic
- check_evals.py: new CLI to print eval completion status from registry
- eval_registry.yaml: declarative registry of models to evaluate
- Delete 4 stale SLURM-era eval configs (SlurmConfig schema removed)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three modes for measuring embedding-space distribution shifts:
- Per-experiment (explicit comparison pairs, faceted by marker)
- Combined (pairwise cross-experiment with batch centering)
- Pooled (concatenates all experiments, BH FDR correction)

Core implementation:
- viscy_utils/evaluation/mmd.py: kernel MMD with median heuristic,
  Gaussian RBF kernel, unbiased estimator, and vectorized permutation
  test (avoids Python loops via binary label matrix multiplication)
- viscy_utils/evaluation/embedding_map.py: mAP via copairs for
  phenotypic profiling (optional dependency)
- evaluation/mmd/config.py: Pydantic config hierarchy for all three
  modes; temporal binning, shared bandwidth, balance_samples
- evaluation/mmd/compute_mmd.py: orchestrates the three analysis modes;
  computes activity_zscore = (mmd2 - null_mean) / null_std for
  cross-marker comparability; outputs per-marker CSV files
- evaluation/mmd/plotting.py: kinetics lines, heatmaps, activity
  z-score heatmaps, combined cross-experiment heatmaps, multi-panel
  grids, paired heatmaps with shared colorbar
- configs/evaluation/recipes/mmd_defaults.yml: shared algorithm defaults
  (1000 permutations, max 2000 cells, seed 42) for YAML inheritance
- tests/test_mmd.py: unit tests for MMD implementation

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ver-time

- orchestrated.py: when marker_filters is None, auto-discover all unique
  obs["marker"] values and run one classifier per marker; save trained
  pipelines as {task}_{marker}.joblib with manifest.json; add
  _plot_f1_over_time for per-class F1 at each timepoint; output one
  {task}_summary.pdf per task (was a single merged PDF)
- orchestrated_test.py: update fixtures to expect 2 rows per task with
  auto-expansion; add test for sparse-marker skipping and F1-over-time
  plot generation
- append_annotations.py: new CLI to persist ground-truth annotation
  columns directly into per-experiment zarr obs
- append_predictions.py: new CLI to apply saved classifier pipelines to
  all cells in per-experiment zarrs, writing predicted_{task} to obs and
  predicted_{task}_proba to obsm

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When group_by is set (default "marker"), evaluate_smoothness iterates
over unique group values, computes smoothness per group, saves per-group
CSV, generates per-group plots, then aggregates via mean/std. Output
filenames now include experiment_name for disambiguation.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Evaluates whether DynaCLR embeddings improve cell tracking on Cell
Tracking Challenge datasets vs an IoU baseline.

- tracking_accuracy/config.py: Pydantic models for ONNX model entries,
  CTC dataset entries, ILP solver weights, and full benchmark config
- tracking_accuracy/utils.py: seg_dir layout helper, pad_to_shape,
  normalize_crop (z-score using whole-frame statistics)
- tracking_accuracy/evaluate_tracking.py: main benchmark driver
- ctc_tracking_2d_mip_boc.yaml: DynaCLR-2D-MIP vs IoU on DIC-C2DL-HeLa
- ctc_tracking_2d_mip_boc_all.yaml: all CTC sequences variant
- export_onnx_2d_mip_boc.yml: config for exporting the MIP model to ONNX

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Pairplot: change diag_kind kde -> hist; rasterize scatter points to
  prevent PDF bloat; improve legend (alpha=1.0, larger marker sizes)
- Scatter 2D: improve legend (markerscale=6, fontsize=10, framealpha=1.0,
  edgecolor="black")

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
edyoshikun and others added 30 commits April 29, 2026 16:54
…cohorts

Mock, bystander, abortive, and unannotated_productive cohorts have
t_zero=NaN and therefore t_rel_minutes=NaN. The original
per_cell_baseline_distance required pre-window frames in t_rel_minutes,
which dropped the entire mock cohort and made fov_stratified_threshold
crash with "Mock cohort has no finite signal values".

Per discussion §3.7, mocks contribute as a per-frame null distribution
and don't need a t_zero. Fix: when t_rel_minutes is entirely NaN for a
track, use the whole-track mean as the baseline. Productive lineages
(and their sibling daughters) still use the pre-window mean.

End-to-end validation on zikv_productive_07_24:
- Path A-anno + G3BP1 readout produced 28 productive cells with
  oscillation metrics + 12-FOV mock thresholds.
- Path A-anno + phase readout produced 18,589-row signal parquet.
- SEC61 readouts still fail because the cohort's mock cells are all in
  C/1 (G3BP1 mock well); the SEC61 zarr does not contain SEC61
  fluorescence for those wells. This is a cohort-construction issue
  (A/1 SEC61 mock not included in the candidate set), not a code bug.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a DINOv3-frozen evaluation row that uses raw HuggingFace
DINOv3 convnext-tiny weights with no contrastive fine-tuning and
no projection head. Tests whether DynaCLR's contrastive training
adds value over pre-trained features alone.

Required orchestrator changes for HF-loaded models (no Lightning
checkpoint to restore from):
- evaluate_config.py: ckpt_path is now Optional[str] = None.
- evaluate.py: _generate_predict_yaml omits ckpt_path from the
  generated predict YAML when null, so Lightning skips checkpoint
  restoration cleanly.

New files under configs/evaluation/DINOv3-frozen/:
- infectomics-annotated.yaml leaf with ckpt_path: null.
- run_infectomics_annotated.sh SLURM submitter.
- training_config_dinov3_frozen.yaml — synthetic Lightning config
  mirroring DINOv3-temporal-MLP-v1 byte-for-byte except
  encoder.init_args.projection: null. Same backbone, same data
  pipeline, only difference is the projection head — apples-to-apples
  comparison isolating the contribution of contrastive training.

eval_registry.yaml: refactored from check-evals format to
compare_evals format (output_dir + per-entry eval_dir), with all
7 active rows listed for cross-model overlay.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The cross-model comparison was silently producing empty smoothness
plots and crashing on linear classifier plots because of two stale
column/filename references:

1. _load_smoothness globbed *_smoothness_stats.csv but the
   smoothness step writes *_per_marker_smoothness.csv (one file per
   experiment-marker pair). Switched the glob and changed the
   aggregation to concat all per-marker CSVs and compute mean ± std
   across (experiment, marker) rows when plotting.

2. _plot_linear_classifiers indexed columns auroc and marker but
   the LC writer emits val_auroc / train_auroc and marker_filter.
   Switched to val_auroc (held-out generalization, the headline
   number) and marker_filter; updated the y-axis label.

Smoothness comparison now shows a bar per model with mean ± std
error bars instead of a single value from a missing CSV.
LC comparison renders the per-marker val_auroc grid as intended.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the leaf YAML + SLURM run script pairs for the 4 BoC-family
sibling variants and the 2 other-architecture rows that complete
the infectomics-annotated column of the eval matrix:

- DynaCLR-2D-MIP-BagOfChannels-single-marker-fix-shuffler (192->160 zext11, ckpt jbrwhzr3)
- DynaCLR-2D-MIP-BagOfChannels-mixed-markers-fix-shuffler (192->160 zext11, ckpt dlzt3s65)
- DynaCLR-2D-MIP-BagOfChannels-single-marker-192 (384->192 zext16, ckpt p6vlebcu) — uses
  predict.batch_size=128 to fit the larger 384px patches on gpu_2d (default 400 OOMs)
- DynaCLR-2D-BagOfChannels-v3/run_infectomics_annotated.sh
- DINOv3-temporal-MLP-2D-BagOfChannels-v1/run_infectomics_annotated.sh

Each leaf publishes LCs to its own per-model registry under
linear_classifiers/{model}/infectomics/ so siblings don't clobber
each other's vN/ bundles.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two related fixes for the mock cohort selection on real data:

1. Per-dataset control well: replace global mock_well_patterns iteration
   with each dataset's own control_fov_pattern from datasets.yaml.
   Previously every (dataset_id, mock_well) pair was emitted, double-
   tagging C/1 cells under both 2025_07_24_SEC61 and 2025_07_24_G3BP1.
   Now each dataset claims only its own control well (A/1 for SEC61,
   C/1 for G3BP1). Fallback to cohort-level mock_well_patterns kept
   for configs without per-dataset control_fov_pattern.

2. Zarr fallback: new _select_mock_from_zarr() reads (fov, track_id,
   parent_track_id, t) directly from the channel embedding zarr's .obs
   when the source annotation CSV has no rows for the control well.
   Synthesizes infection_state="uninfected" by well design. Fixes the
   case where A/1 was tracked + LC-predicted but never manually
   annotated.

Validated on 2025_07_24: SEC61 mock cohort now sources from A/1 cells
in the SEC61 zarr (6053 cells available); G3BP1 mock cohort still
sources from C/1 cells via the annotation CSV.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Prints a markdown table at the start of each PHATE run showing
resolved values and whether each came from the YAML or the default,
so logs alone are enough to reproduce a run. Also relaxes n_pca to
Optional[int] so configs can disable PHATE's internal pre-PCA via null.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Stage 0's mock-from-zarr fallback was hardcoded to the sensor (NS3)
zarr pattern. SEC61's A/1 control well is absent from the sensor zarr
but present in the SEC61 organelle zarr, so the fallback returned
empty and Stage 3 SEC61 readouts crashed with "Mock cohort empty".

Pass the full embeddings dict from datasets.yaml into
_build_dataset_cohorts and pick the per-dataset organelle pattern
based on the dataset suffix (e.g. SEC61 -> organelle_sec61).

Verified: SEC61 A/1 now contributes 123 mock tracks; total mock
cohort grew from 184 to 206 lineages on zikv_productive_07_24.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Previously assign_lineage_ids assigned a positional integer counter to
each lineage. The counter incremented in iteration order through tracks,
so a track's lineage_id changed whenever Stage 0 cohort composition
shifted (e.g., adding mock cells from a new control well). Downstream
parquets keyed on lineage_id then desynced from each other unless every
stage was re-run together — silently producing zero-row merges in
Stage 4 readouts.

Replace the integer counter with a deterministic string id derived from
the branch endpoints:

  "{dataset_id}|{fov_name}|root{root_track_id}|leaf{leaf_track_id}"

Stable across re-runs as long as the underlying tracks haven't changed.
Orphan sentinel becomes "" instead of -1.

Update consumers to use string equality / emptiness checks:
- select_candidates.py / manual_candidates.py: t_zero + divides lookups
- align_anno.py / align_lc.py: per-lineage anchor dicts
- align_embedding.py: fillna("") instead of fillna(-1).astype(int)
- readout_common.py: per-cell metric rows
- compare_phase_to_fluor.py: per-cell onset rows

Verified end-to-end: Stage 0 -> 4 produces identical Spearman ρ values
for the A-anno track (g3bp1 ρ=0.943, sec61 ρ=-0.200) on the
zikv_productive_07_24 cohort.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Three small changes that together let us inspect and pool comparisons:

1. fov_stratified_threshold gains a well-row fallback. Previously the
   per-FOV check fell back straight to global because mock cells live
   in sibling wells (A/1) that never share a FOV with productive cells
   (A/2). The new ladder is same_fov -> same_row -> global, recording
   which level supplied the threshold via n_mock_source. Numbers don't
   change for 07_24 because SEC61 mock only exists in A-row anyway.

2. plot_paired_traces.py renders per-cell phase + fluor cosine-distance
   traces side-by-side for every paired cell that contributes to the
   compare_phase_to_fluor Spearman ρ. Used to see *why* g3bp1 ρ is high
   and sec61 ρ is unstable.

3. zikv_productive_pooled candidate set adds 07_22_G3BP1 + 08_26_SEC61
   alongside 07_24. Stage 0 now handles empty productive_df cleanly
   (was crashing on KeyError: 'fov_name'). Note: under the current
   productive_filter the additional datasets contribute 0 productive
   lineages — 07_22's 10-min frame interval makes the 600-min track
   length unreachable (1/73 candidates pass), and 08_26's annotations
   have no tracks spanning both infected and uninfected states. The
   pooled cohort therefore equals the 07_24-only cohort until per-
   dataset productive_filter overrides are introduced or the
   annotations are extended.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
LC tasks are heavily imbalanced (cell_division_state ~99/1, organelle_state
~91/9), so val_auroc alone misleads — it ranks well on the minority class
but doesn't expose whether the classifier is usable at the decision
threshold.

- linear_classifier.py: persist train_<class>_support and val_<class>_support
  in metrics_summary.csv so downstream plots can show per-class N.
- compare_evals.py: new linear_classifiers_per_class.pdf with grouped bars
  for (class × {precision, recall, f1}) per (task, marker_filter), plus
  a panel-title annotation "val N=… | minority <cls>=N (%)" when support
  columns are present. Falls back gracefully on older metrics_summary.csv
  files without support columns.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Curated registry listing every model with an
evaluations/infectomics-annotated/linear_classifiers/metrics_summary.csv
on the cluster. Drives compare_evals.py — produces smoothness and
linear_classifiers comparison PDFs in one place rather than ad hoc
per-pair invocations.

Six models compared on commit: DINOv3-frozen, DINOv3-temporal-MLP-v1,
and four DynaCLR-2D-MIP-BagOfChannels variants (zext11 baseline plus
single-marker-fix, mixed-markers-fix, and 384to192/zext16 ablations).
Add new entries here as more models are run.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Earlier search missed this one (deeper path:
SEC61_TOMM20_G3BP1_Sensor/time_interval/dynaclr_gfp_rfp_Ph/v3/). It uses
the same standard layout and metrics_summary.csv schema, so the
comparison now spans all 7 models with infectomics-annotated runs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two structural changes that together let us pool 07_22 and 08_26 into
the productive cohort:

1. cohort_rules.min_non_productive_minutes (default 300) decouples the
   bystander/abortive/mock track-length floor from the productive
   filter's pre+post window. Previously _select_well_tracks was gated
   by min_pre + min_post (600 min for 07_24), which meant short tracks
   from comparison cohorts were silently dropped along with productives
   that genuinely don't qualify. The decoupled gate stops over-filtering
   the null distributions.

2. dataset_cfg.productive_source ("annotation_csv" default, or "lc_zarr")
   selects whether productive cells come from the manual annotation CSV
   or from predicted_infection_state in the embedding zarr. Used for
   datasets like 08_26 whose annotation CSV is per-frame rather than
   track-linked, making A-anno impossible but A-LC viable.

   _select_productive_from_zarr applies the same pre/post window logic
   as _select_productive_tracks, but uses the first sustained run of
   min_run consecutive predicted-positive frames as the anchor (same
   convention as align_lc.py).

Also: skip cells with NaN t_rel_minutes in compare_phase_to_fluor and
plot_paired_traces — when 08_26 LC-zarr productive cells flow through
the A-anno track, they have signal computed (whole-track-mean fallback)
but no anchor, so onset is undefined; previously this produced NaN ρ.

Pooled cohort (zikv_productive_pooled, 07_24 + 07_22 + 08_26) now
yields 164 productive lineages (vs 59 strict 07_24-only) and a
SEC61 A-LC ρ = 0.810, perm p < 0.001 (n=37) — first well-powered
correlation between phase and SEC61 onset times.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Each plot function used to build its own model_color dict by interpolating
through tab10 with len(models) — producing slightly different mappings
across plots, and outright wrong colors in the smoothness panel where
np.arange(len(means.index)) walked the per-aggregate index rather than
the global model list.

Now _build_model_palette generates the dict once from the registry's
model list (sorted, discrete tab10 indices, falls back to tab20 for
>10 models) and main() threads it through smoothness, LC AUROC, LC
per-class, and MMD plots. Same model is the same color everywhere.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
torch.as_tensor() crashes on numpy U/S/O dtypes. Take the
all_gather_object path for string columns, keep the fast tensor
path for numeric ones. Without this, training with any string
metadata key (perturbation, marker) crashes on the first
multi-rank online-eval step.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
os.cpu_count() returns the node's physical core count, not the
SLURM-allocated count. On a 48-core node where SLURM gave us 16,
ad-hoc users of os.cpu_count() oversubscribe. Centralize the
SLURM_CPUS_PER_TASK fallback in viscy_utils.mp_utils.available_cpus
and route MultiExperimentDataModule's tensorstore concurrency
through it.

Pin BLAS to 1 thread per process in REDUCE_COMBINED — PHATE's
joblib n_jobs spawns one worker per allocated CPU, so unbounded
BLAS would yield ~cores^2 threads. Standard sklearn parallelism
pattern (one axis at a time).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PHATE's internal PCA pre-reduction (graphtools -> sklearn ->
scipy.linalg.lu) deadlocks silently on scipy 1.17.1 + sklearn
1.8.0 — process sits at ~0% CPU forever. Wire X_pca_combined
back into PHATE so it skips its own pre-PCA: when phate.n_pca is
null, fit on the already-reduced PCA output instead of raw .X.

Add caller-owned fit-set indexing (fit_idx) to
viscy_utils.evaluation.compute_phate so the orchestrator can
draw a per-store lineage cap. Whole lineages are sampled per
store (cap=N each); PHATE fits on the union and transforms the
full input. Re-enables PHATE in the eval recipe with a 100-cell
per-store cap for fast iteration; bump for paper figures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cross-modal InfoNCE head pulls image features toward a paired
per-cell vector target (e.g. transcriptomic embedding). Image
and target sides each pass through a small projector into a
shared space; samples whose target contains NaN (unpaired
cells) are masked out so the head runs on partially-paired
batches.

Extend ContrastiveModule._get_labels to handle vector-valued
metadata: list/tuple/array entries are stacked into (B, D) float
tensors, scalars stay as (B,) long tensors. Required for the
new head's paired-target lookup.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CELL-DINO is a DINOv2-architecture ViT pretrained on
fluorescence microscopy (Human Protein Atlas). The
channel_adaptive_dino_vitl16 checkpoint processes one channel
at a time through a single-channel ViT-L/16 stem; the wrapper
reshapes (B, C, H, W) -> (B*C, 1, H, W), runs the backbone, and
mean-pools the cls token across channels for a fixed-dim
embedding regardless of input channel count. Weights load from
a local .pth state_dict — nothing fetched from the network.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PCA pairplot rendering is per-coloring-variable independent;
fan out across colorings using joblib loky workers (one worker
per coloring, capped by available_cpus). Workers re-import
matplotlib + seaborn (~1s overhead) so the gain only kicks in
for pairplot_components >= 4 on >100k cells, which matches the
paper-figure config.

Add the pairplot_components knob to the infectomics recipe at
4 (PC1..PC4 grid = 16 panels per coloring); bump higher for
final paper figures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The fixed coupling between PLOT and PLOT_COMBINED forced every
infectomics run to fan out per-experiment scatter even when only
the combined figure was needed. Make both stages
independently togglable via steps:; the Nextflow DAG already
checks `steps`, just had hardcoded behaviour assuming both
always run.

Switch infectomics-annotated to plot_combined only — the
per-experiment scatter doesn't carry into paper figures.

Drop the redundant marker_filters on cell_death_state
(applies to all markers; the filter was leftover from when LC
was only run on G3BP1/SEC61B sensors).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Job 31449149 OOM'd in cgroup on rank 3 host RAM (not VRAM) on
the 384² single-marker variant. Loader prefetch buffers scale
with workers × prefetch_factor, not batch_size.

- Drop prefetch_factor 2→1 in the BoC base config — halves
  in-flight batches per worker, restores earlier behaviour.
- Drop the 192 sbatch from 4→2 GPUs and bump --mem-per-cpu
  14G→17G (255 GB/rank, 510 GB/node) so each rank has more
  headroom; also eases queue priority. Pin trainer.devices=2
  in the override yml so the Lightning config matches.

Batch size kept at 256/rank — host RAM was the OOM driver.
If this still OOMs, suspect a real leak (loky semaphores,
tensorstore decoder scratch) rather than papering over with
more RAM.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Same physical microglia cells appeared three times in the
collection (BF, Phase3D, Retardance), tripling the experiment's
row count and biasing marker/experiment sampling without adding
biological signal. Keep Phase3D only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add status legend (✅ landed / 🔄 running / ⬜ pending) and
inline notes per model so the registry reads as a state-of-the
bake-off. Stable name strings ensure the model→color palette
matches across infectomics-annotated, alfi, and microglia
registries.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the primary_analysis.csv + cage_crop parsing path with
a direct read from the cells AnnData zarr (dinov2.zarr / rna.zarr
under a shared anndata_dir). The fov_name column is the zarr
path; load_cells_anndata returns it as zarr_path so the rest of
the pipeline is unchanged.

Split CLI: data_paths.yml carries the shared zarr_store +
anndata_dir + output_dir, and embed_<model>.yml carries
per-model config (channels, output_key, target_pixel_size,
batch_size). Both files are merged at startup.

Add a max_cells smoke-test knob that truncates the cell table
post-filter for fast iteration.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- viscy-utils: pin anndata<0.12.9 across all/anndata/dev/test extras
  (matches pyproject; the constraint was added but the lock hadn't
  been refreshed)
- viscy: normalize gurobipy specifier to the same range
- nvidia-* and cuda-bindings: add platform_machine != 's390x'
  markers per uv solver auto-update

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PCA-RGB timelapse MP4 export needs imageio's FFmpeg plugin;
without it the timelapse CLI silently falls back to GIF.
Bundle matplotlib so the visualization helpers don't pull it
through a transitive eval-extra dependency.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EmbeddingWriter's primary array (adata.X) is hard-coded to the
encoder backbone "features". DINOv3-temporal-MLP and similar
frozen-backbone-with-trained-head models put all the learned
task signal in the projection head — predicting features in
that case discards the only learned component.

Add a predict.embedding_key knob ("features" | "projections")
that the eval orchestrator threads into the generated predict
YAML. The unselected array remains as a sidecar in obsm.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The MLP head is the only finetuned component — the DINOv3
backbone is frozen during training. Defaulting to features would
make this row a duplicate of DINOv3-frozen and discard the only
learned task signal.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants