Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
146 commits
Select commit Hold shift + click to select a range
c54f568
utility to combine multiple ann datasets and compute dim reduction me…
edyoshikun Mar 31, 2026
497bcfa
batch z transform for 2D MIP
edyoshikun Apr 1, 2026
5f5acea
cell_index: add preprocess_cell_index and flat parquet schema extensions
edyoshikun Apr 3, 2026
f536c5a
DynaCLR data: parquet-first pipeline + CenterCrop final crop
edyoshikun Apr 3, 2026
90c697a
training configs + dataloader demo script
edyoshikun Apr 3, 2026
b50e81b
dynaclr info: handle sparse X matrices
edyoshikun Apr 4, 2026
474b66d
adding files for training
edyoshikun Apr 7, 2026
55d2004
spurious slash in the file
edyoshikun Apr 7, 2026
8bea25b
-multiexperiment prediction
edyoshikun Apr 8, 2026
2f3c1bc
Merge branch 'modular-viscy-staging' into dynadtw
edyoshikun Apr 8, 2026
bd44317
add recipes " - trainer.yml: shared seed, accelerator, logger enti…
edyoshikun Apr 8, 2026
8e0b3b9
Add linear classifier summary plots and remove evaluate_dataset.py
edyoshikun Apr 9, 2026
04e68e4
Add per-well channel validity to ChannelEntry and FOVRecord
edyoshikun Apr 10, 2026
34f04bf
Fix position filter to use is_dir() instead of name prefix check
edyoshikun Apr 10, 2026
e8ff671
Add viral_sensor and Phase3D channels to BoC collections; add v3 3D c…
edyoshikun Apr 10, 2026
e03ddda
Add base: inheritance to eval configs via load_composed_config
edyoshikun Apr 14, 2026
d6d3614
Fix channel_utils regex for PhC and BF label-free detection
edyoshikun Apr 14, 2026
44e0545
Fix ArrowStringArray compatibility in embedding writer and zarr utils
edyoshikun Apr 14, 2026
42d0879
Add PHATE n_jobs control and improve annotation join flexibility
edyoshikun Apr 14, 2026
c796a4d
Add per-class AUROC to linear classifier metrics
edyoshikun Apr 14, 2026
6e1d6c3
Add onnx, copairs, and tracking optional dependencies
edyoshikun Apr 14, 2026
00d2166
Vectorize data pipeline for large-scale performance
edyoshikun Apr 14, 2026
84a9140
Make cellanome embedding scripts work without transcriptome data
edyoshikun Apr 14, 2026
bd63bc9
Update training and collection configs; add new dataset collections
edyoshikun Apr 14, 2026
210efb0
Refactor eval orchestrator: replace SLURM scripts with Nextflow manifest
edyoshikun Apr 14, 2026
031c40f
Add MMD perturbation evaluation system
edyoshikun Apr 14, 2026
45dd151
Improve linear classifiers: auto-expand markers, save pipelines, F1-o…
edyoshikun Apr 14, 2026
7193d48
Add per-marker smoothness grouping with mean/std aggregation
edyoshikun Apr 14, 2026
f10fb51
Add CTC tracking accuracy benchmark
edyoshikun Apr 14, 2026
364f83c
Improve embedding plot rendering: rasterization, legends, histograms
edyoshikun Apr 14, 2026
a5657fd
Add eval configs for ALFI mitosis and microglia datasets
edyoshikun Apr 14, 2026
6ac0a0d
Update evaluation DAG documentation for Nextflow pipeline
edyoshikun Apr 14, 2026
b8b0fce
Rewrite pseudotime pipeline with DTW-based alignment
edyoshikun Apr 14, 2026
50cf2bb
Add dataloader profiling and inspection scripts
edyoshikun Apr 14, 2026
2a95062
Add evaluation comparison and analysis scripts
edyoshikun Apr 14, 2026
ea5f8e2
Add Airtable dataset preparation utilities
edyoshikun Apr 14, 2026
1f8f885
Add cellanome embedding configs and DAG documentation
edyoshikun Apr 14, 2026
087c4e3
Add TODO notes for ArrowStringArray workarounds pending anndata 0.13
edyoshikun Apr 14, 2026
5f49bc8
Add microscope/modality/treatment fields and auto-delete well templat…
edyoshikun Apr 15, 2026
a45b061
remove the cellanome configs
edyoshikun Apr 15, 2026
02bca0b
restructure pseudotime evals
edyoshikun Apr 15, 2026
332a8b8
Add per-cell timing metrics (Stage 3c/3d) for organelle remodeling
edyoshikun Apr 18, 2026
833f917
Document Stage 3c/3d timing metrics in pseudotime DAG
edyoshikun Apr 18, 2026
e1d0cb1
Auto-select group-by in label-timing compare step
edyoshikun Apr 18, 2026
14aefd9
Add missing viral_sensor + Phase3D experiments to LC recipe
edyoshikun Apr 18, 2026
1435f49
Fix cross-experiment FOV cache collision in triplet dataset
edyoshikun Apr 18, 2026
e500a05
Vectorize per-batch positive lookup in triplet dataset
edyoshikun Apr 18, 2026
162790a
Lazy batch generation + NumPy Categorical groupby in sampler
edyoshikun Apr 19, 2026
73fe3e1
Whitelist anchor-cache columns and coerce Categorical keys in dataset
edyoshikun Apr 19, 2026
08c0c92
Cast fov_name and well_name to Categorical after alignment
edyoshikun Apr 19, 2026
015b526
Materialize strings, mask-based FOV split, val-empty guard in datamodule
edyoshikun Apr 19, 2026
b721cd6
Cast low-cardinality strings to Categorical at parquet load
edyoshikun Apr 19, 2026
40ed2f7
Delete SaveConfigToWandb callback (DDP setup-hook deadlock)
edyoshikun Apr 19, 2026
b1730e2
Add single-marker A/B variants for 2D-MIP and 3D-BoC-v2
edyoshikun Apr 19, 2026
249e1bf
Move fastdev/tiny diagnostic configs to training/debug/
edyoshikun Apr 19, 2026
80edf4c
Include marker in temporal valid_anchors match key
edyoshikun Apr 20, 2026
1bf15de
Scope lineage reconstruction by well, not just fov
edyoshikun Apr 20, 2026
a78ad08
Bump 2D-MIP-BoC (→v2) and 3D-BoC (→v4) parquets after lineage fix
edyoshikun Apr 20, 2026
43263fe
Use FlexibleBatchSampler for val so composition matches train
edyoshikun Apr 20, 2026
ba81457
Organize training configs into per-model-family subfolders
edyoshikun Apr 20, 2026
1632b5f
remove spurious file
edyoshikun Apr 20, 2026
f4f40c3
Fix FlexibleBatchSampler DDP wiring + epoch advance + dataset mixed-C…
edyoshikun Apr 23, 2026
83cfd02
Merge branch 'modular-viscy-staging' into dynadtw
edyoshikun Apr 24, 2026
7b3daed
Delete stale DynaCLR tests after iohub 0.3.2 merge
edyoshikun Apr 24, 2026
9c14f8a
Fix MultiExperimentIndex.clone_with_subset to propagate tensorstore_c…
edyoshikun Apr 24, 2026
bc8c8bd
Make _ddp_topology robust to trainer stubs without DDP attrs
edyoshikun Apr 24, 2026
742b426
move profiling
edyoshikun Apr 24, 2026
f01850e
trainer: enable cuDNN benchmark mode
edyoshikun Apr 25, 2026
bd923f5
slurm/train.sh: enable TF32 for fp32 matmuls
edyoshikun Apr 25, 2026
8d9362e
datamodule: expose recheck_cached_data and file_io_concurrency
edyoshikun Apr 25, 2026
61396c5
dataset: overlap shape-group reads with ts.Batch()
edyoshikun Apr 25, 2026
c85f6b2
2D MIP BoC: nw=4 and chunk-aligned z_extraction_window=16
edyoshikun Apr 25, 2026
c4d3755
profiling: retarget benchmark scripts to production parquet
edyoshikun Apr 25, 2026
1aa0171
fastdev-ddp: fix override config path
edyoshikun Apr 25, 2026
6913627
training sbatch: warm-start at epoch 0 with -fix-shuffler suffix
edyoshikun Apr 25, 2026
00d709d
Add per-dataset evaluation recipes for matrix layout
edyoshikun Apr 25, 2026
45c3d93
Reorganize evaluation leaves into per-model folders
edyoshikun Apr 25, 2026
7c65cea
2D MIP BoC: cap in-flight batches and disable per-plate cache pool
edyoshikun Apr 25, 2026
180c71d
datamodule: revert file_io_concurrency default to None
edyoshikun Apr 25, 2026
cf3df10
2D MIP BoC: tighten ThreadBuffer queue and disable per-plate cache
edyoshikun Apr 25, 2026
ddaf8c5
2D MIP BoC: cap train/val batches per epoch
edyoshikun Apr 25, 2026
26cef2d
fix(OnlineEvalCallback): use sync_dist=True instead of rank_zero_only
edyoshikun Apr 25, 2026
5a62983
Add central LC registry support to evaluation orchestrator
edyoshikun Apr 25, 2026
68b2219
ctc move
edyoshikun Apr 25, 2026
56b3e69
Decouple append-annotations from linear_classifiers config
edyoshikun Apr 25, 2026
cae868c
Add Wave-1 SLURM submission script for infectomics-annotated
edyoshikun Apr 25, 2026
e551eb2
collections: bump to v3 (drop dynamorph BF + Retardance)
edyoshikun Apr 25, 2026
e0ef6f8
2D MIP base: point cell_index_path at v3 parquet
edyoshikun Apr 25, 2026
aa6d6d8
Lower PHATE subsample to 20k in reduce_combined recipe
edyoshikun Apr 25, 2026
8a7245e
Add evaluation matrix DAG doc with central LC registry design
edyoshikun Apr 25, 2026
a39d2f5
2D MIP single-marker 192: 384->256->192 patch variant
edyoshikun Apr 25, 2026
f44b8ff
Lineage-aware PHATE subsampling in combined-dim-reduction
edyoshikun Apr 25, 2026
8232848
fix(2D MIP 192): tighten random crop to 216 + drop warm-start
edyoshikun Apr 25, 2026
b4f5ff7
fix(2D MIP 192 sbatch): mem-per-cpu 8G -> 14G to avoid host RAM OOM
edyoshikun Apr 25, 2026
da9cce7
Move pseudotime package out of evaluation/ and add io.py
edyoshikun Apr 25, 2026
028452b
fix(OnlineEvalCallback): all_gather features under DDP
edyoshikun Apr 25, 2026
5d6c79d
refactor(cli_utils): hoist load_composed_config import to module top
edyoshikun Apr 26, 2026
881c929
fix(prepare): subclass Dumper instead of mutating yaml.Dumper
edyoshikun Apr 26, 2026
a40edd0
refactor: hoist inline imports to module top
edyoshikun Apr 26, 2026
d118184
fix(mmd): cap pooled-kernel size and drop dead helper
edyoshikun Apr 26, 2026
ff04365
Move LC registry to per-task-domain sub-registries
edyoshikun Apr 26, 2026
508a2c7
fix(slurm): make training/debug shell scripts location-portable
edyoshikun Apr 26, 2026
e9c3f2f
deps: cap anndata <0.12.9 in viscy-utils
edyoshikun Apr 26, 2026
9dbe14d
refactor(cell_index): parquet-only preprocess + logger over print
edyoshikun Apr 26, 2026
547550a
chore: rank-zero warn in effective_rank, pytest.raises in z_reduction…
edyoshikun Apr 26, 2026
c89456c
Untrack evaluation_matrix.md (local-only planning doc)
edyoshikun Apr 26, 2026
9ac5741
Disable PHATE; bump wrapper RAM; add microglia Wave-2 SLURM script
edyoshikun Apr 27, 2026
0b50a16
Track DynaCLR Nextflow pipeline source under version control
edyoshikun Apr 28, 2026
8e61de9
Restructure pseudotime DAG to three-track methodology
edyoshikun Apr 29, 2026
7f5876f
Phase 1 — Stage 0: lineage reconnection + cohort tagging
edyoshikun Apr 29, 2026
bda6406
Phase 2 — Path A baselines: align_anno + align_lc
edyoshikun Apr 29, 2026
4af0c09
Fix Stage 0 + Stage 2 cohort tagging bugs surfaced by validation
edyoshikun Apr 29, 2026
8a87f62
Phase 3 — Path B refresh: tau_event_band, gating columns, schema unif…
edyoshikun Apr 29, 2026
508d808
Phase 4 — Per-organelle Stage 3 readouts
edyoshikun Apr 29, 2026
ffa7ede
Phase 5 — compare_tracks.yaml schema
edyoshikun Apr 29, 2026
d80afe2
Phase 5 — Stage 4 cross-track comparison
edyoshikun Apr 29, 2026
5fe7d56
Fix readout_common: fall back to whole-track baseline for unanchored …
edyoshikun Apr 29, 2026
394db2d
Add DINOv3-frozen baseline and ckpt_path-less foundation-model support
edyoshikun Apr 30, 2026
1ed4a51
Fix compare_evals.py to find smoothness CSVs and val_auroc column
edyoshikun Apr 30, 2026
8056ec6
Add per-leaf eval submitters for the BoC variant matrix
edyoshikun Apr 30, 2026
a8bcf91
Mock cohort fallback: per-dataset control_fov_pattern + zarr fallback
edyoshikun Apr 30, 2026
011f44d
Log resolved PHATE params and allow n_pca=null
edyoshikun Apr 30, 2026
98c5eb2
Mock fallback uses per-dataset organelle zarr pattern
edyoshikun Apr 30, 2026
f251193
Make pseudotime lineage_id content-stable across re-runs
edyoshikun May 1, 2026
a16dc73
Pseudotime: well-row threshold ladder, paired-trace plot, pooled cohort
edyoshikun May 1, 2026
9026b0e
Linear classifier eval: per-class precision/recall plot with support
edyoshikun May 1, 2026
d90c146
Eval registry for infectomics-annotated cross-model comparison
edyoshikun May 1, 2026
6fb05be
Add DynaCLR-2D-BagOfChannels-v3 to infectomics eval registry
edyoshikun May 1, 2026
1527adf
Decouple non-productive gate, add LC-zarr productive source
edyoshikun May 1, 2026
b616da5
Stable model→color palette across all comparison plots
edyoshikun May 1, 2026
27fdd26
Fix OnlineEvalCallback DDP gather on string-dtype arrays
edyoshikun May 5, 2026
1c98990
SLURM-aware available_cpus() helper, replace ad-hoc lookups
edyoshikun May 5, 2026
2ee28a2
PHATE per-store fit cap and skip-internal-PCA path
edyoshikun May 5, 2026
1121467
Add CrossModalContrastiveHead with vector-label batch support
edyoshikun May 5, 2026
613f12c
Add CellDinoModel foundation wrapper
edyoshikun May 5, 2026
5bd97b8
Parallelize per-coloring pairplots, add pairplot_components knob
edyoshikun May 5, 2026
1851244
Toggle plot vs plot_combined independently in eval pipeline
edyoshikun May 5, 2026
db77190
Cap host RAM on 384² single-marker run: 2 GPUs, prefetch_factor=1
edyoshikun May 5, 2026
5eedc84
Drop microglia BF and Retardance from BoC v2 collection
edyoshikun May 5, 2026
d5cbeed
Annotate infectomics eval registry with status and run notes
edyoshikun May 5, 2026
307371c
Cellanome embed scripts: split data_paths from per-model config
edyoshikun May 5, 2026
2fe7124
Revert "Cellanome embed scripts: split data_paths from per-model config"
edyoshikun May 5, 2026
d13e572
chore: refresh uv.lock
edyoshikun May 5, 2026
bd61be7
Add 'viz' extra to dynaclr: imageio-ffmpeg + matplotlib
edyoshikun May 5, 2026
52eadbd
Make eval predict step's primary embedding key configurable
edyoshikun May 5, 2026
0556f6a
DINOv3-temporal-MLP eval: use projections as primary embedding
edyoshikun May 5, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -66,3 +66,6 @@ slurm*.out
lightning_logs/

# NOTE: uv.lock is NOT ignored - it should be tracked for reproducibility

# Local-only planning docs (not for upstream)
applications/dynaclr/docs/DAGs/evaluation_matrix.md
47 changes: 47 additions & 0 deletions applications/airtable/configs/prepare_config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# Dataset preparation pipeline: NFS -> VAST rechunked zarr v3
# Usage: prepare run <dataset_name> -c prepare_config.yml [--dry-run]

nfs_root: /hpc/projects/intracellular_dashboard/organelle_dynamics
vast_root: /hpc/projects/organelle_phenotyping/datasets
workspace_dir: /hpc/mydata/eduardo.hirata/repos/viscy

concatenate:
# null = auto-detect raw channels (Phase3D + raw *). Set explicitly to override.
channel_names: null
chunks_czyx: [1, 16, 256, 256]
shards_ratio: [1, 1, 8, 8, 8]
output_ome_zarr_version: "0.5"
conda_env: biahub
# Override biahub's internal SLURM settings (passed via -sb flag)
# Set to null to use biahub defaults
sbatch_overrides:
partition: cpu

qc:
channel_names: [Phase3D]
NA_det: 1.35
lambda_ill: 0.450
pixel_size: 0.1494
midband_fractions: [0.125, 0.25]
device: cuda
num_workers: 16

preprocess:
channel_names: -1
num_workers: 32
block_size: 32

# biahub concatenate submits its own SLURM jobs via submitit (no config needed)
# QC and preprocess run as separate SLURM jobs (no race condition)
slurm:
qc:
partition: gpu
gres: "gpu:1"
cpus_per_task: 16
mem_per_cpu: 4G
time: "00:30:00"
preprocess:
partition: cpu
cpus_per_task: 32
mem_per_cpu: 4G
time: "04:00:00"
3 changes: 3 additions & 0 deletions applications/airtable/scripts/write_experiment_metadata.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,9 @@ def register(position_paths: list[Path], dry_run: bool = False, dataset: str | N
if result.updated:
db.batch_update(result.updated)
logger.info("Updated %d existing records", len(result.updated))
if result.template_ids_to_delete:
db.batch_delete(result.template_ids_to_delete)
logger.info("Deleted %d well template records", len(result.template_ids_to_delete))

print(format_register_summary(result, dry_run=dry_run))

Expand Down
15 changes: 15 additions & 0 deletions applications/airtable/src/airtable_utils/database.py
Original file line number Diff line number Diff line change
Expand Up @@ -143,3 +143,18 @@ def batch_create(self, records: list[dict]) -> list[dict]:
Created records as returned by the Airtable API.
"""
return self._table.batch_create([r["fields"] for r in records])

def batch_delete(self, record_ids: list[str]) -> list[dict]:
"""Batch-delete records by ID.

Parameters
----------
record_ids : list[str]
Airtable record IDs to delete.

Returns
-------
list[dict]
Deletion confirmations from the Airtable API.
"""
return self._table.batch_delete(record_ids)
Loading