Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
312 commits
Select commit Hold shift + click to select a range
4e8581e
refactor(eval): consolidate save_metrics loop, skip empty DataFrames
alxndrkalinin Apr 16, 2026
1b10b7f
refactor(eval): split GT/pred feature computation, add force_recompute
alxndrkalinin Apr 16, 2026
ebf769d
feat(eval): integrate GT cache into evaluate_predictions
alxndrkalinin Apr 16, 2026
4f43dfe
feat(eval): add dynacell precompute-gt CLI
alxndrkalinin Apr 16, 2026
f68eca0
docs(eval): document GT cache, precompute-gt CLI, parallel sweeps
alxndrkalinin Apr 16, 2026
db70c78
refactor(eval): batch zarr opens per FOV, dedup slug, type kind
alxndrkalinin Apr 16, 2026
de4882b
refactor(eval): encapsulate cache dirty flag, narrow broad except
alxndrkalinin Apr 16, 2026
c822c84
test(eval): add pinned-value regression tests for feature pairing
alxndrkalinin Apr 16, 2026
fd030f8
update the model .yml file for unetvit3d
Apr 17, 2026
60f9ca9
update the training script for unetvit3d on sec61b
Apr 17, 2026
1690b7f
perf(eval): cache ckpt sha256 via sidecar file
alxndrkalinin Apr 17, 2026
7df8f07
feat(cli): strip launcher and benchmark reserved keys in compose
alxndrkalinin Apr 17, 2026
a83c4a2
chore(configs): commit benchmark schema and virtual_staining skeleton
alxndrkalinin Apr 17, 2026
8114048
feat(configs): add CellDiff train leaves for er/mito/nucleus/membrane
alxndrkalinin Apr 17, 2026
22bdab9
feat(configs): add CellDiff predict leaves (self-predict)
alxndrkalinin Apr 17, 2026
8e00988
feat(tools): add submit_benchmark_job.py with dry-run and sbatch temp…
alxndrkalinin Apr 17, 2026
13da046
chore(configs): archive Dihan's CellDiff trees under tools/LEGACY
alxndrkalinin Apr 17, 2026
1b7dae8
docs(dynacell): update README with benchmark layout and submit tool
alxndrkalinin Apr 17, 2026
86db6d4
refactor(utils): promote deep_merge to public API
alxndrkalinin Apr 17, 2026
ff53b3d
fix(tools): address simplify review findings
alxndrkalinin Apr 17, 2026
219b9b0
fix(tools): address code review — pytest pythonpath, flag semantics
alxndrkalinin Apr 17, 2026
7706ae8
fix(tools): decouple preview contract from --dry-run
alxndrkalinin Apr 17, 2026
4a967c0
fix(tools): shlex-quote env values in rendered sbatch
alxndrkalinin Apr 17, 2026
4e64ff3
test(utils): restore test_deep_merge_* underscore separator
alxndrkalinin Apr 17, 2026
5b352cc
docs(dynacell): document submit tool flags and preview contract
alxndrkalinin Apr 17, 2026
5e69dc7
docs(eval): note ckpt sha256 sidecar under cache identity
alxndrkalinin Apr 17, 2026
9e94d02
feat(configs): migrate UNetViT3D and FNet3D paper SEC61B leaves to sc…
alxndrkalinin Apr 17, 2026
a2361ca
refactor(data): rename HCSDataModule preload kwarg to mmap_preload
alxndrkalinin Apr 17, 2026
7096d64
feat(configs): add FNet3D paper-baseline fit leaves for 3 more organe…
alxndrkalinin Apr 17, 2026
6d00854
feat(tools): make sbatch constraint directive optional
alxndrkalinin Apr 17, 2026
16fa6fa
fix(configs): align fnet3d_paper leaves with paper-run hardware + max…
alxndrkalinin Apr 17, 2026
6ed2494
fix(configs): bump gpu_any_long mem to 512G to survive mmap preload
alxndrkalinin Apr 17, 2026
ffd84d7
update unetvit3d training yml
Apr 17, 2026
44aa49c
fix(configs): narrow 512G mem bump to cell.zarr-backed leaves
alxndrkalinin Apr 17, 2026
c9b8f3c
fix(configs): drop num_log_steps from unetvit3d overlay
alxndrkalinin Apr 17, 2026
e6780bb
test(configs): allow checkpoint policy divergence in unetvit3d test
alxndrkalinin Apr 17, 2026
66b4a71
feat(configs): add UNeXt2 SEC61B fit leaf (Run 4 reproduction)
alxndrkalinin Apr 17, 2026
be84b25
update the predict_method for unetvit3d
Apr 17, 2026
c9a6e16
feat(dynacell): add denoise_sliding_window with overlap averaging
Apr 17, 2026
4702d7a
feat(configs): set predict_method=iterative for celldiff iPSC confocal
Apr 17, 2026
8b2332c
perf(data): preserve native dtype in mmap_preload, cast on sample read
alxndrkalinin Apr 17, 2026
70765f2
refactor(data): tighten array_key sentinel, drop WHAT-comments in tests
alxndrkalinin Apr 17, 2026
9734d07
refactor(configs): rename runtime_single_gpu to runtime_shared
alxndrkalinin Apr 18, 2026
f9b8f1e
feat(configs): add topology recipes under dynacell and cytoland
alxndrkalinin Apr 18, 2026
5b7eaae
refactor(configs): unify fit/predict trainer recipes, own topology se…
alxndrkalinin Apr 18, 2026
3fdb7cf
refactor: trim WHAT-comments and drop unused _strip_reserved in confi…
alxndrkalinin Apr 18, 2026
bde233d
refactor(tools): drop undocumented stdout echo from --dry-run
alxndrkalinin Apr 18, 2026
e0f5c00
refactor(cli): log warning when composed-config parse fails
alxndrkalinin Apr 18, 2026
f31205c
perf(compose): memoize YAML parsing in load_composed_config
alxndrkalinin Apr 18, 2026
957cf9d
ci: add dynacell benchmark-config tests to the test matrix
alxndrkalinin Apr 18, 2026
44c2834
feat(configs): set predict params and fix output paths for CELLDiff i…
Apr 20, 2026
f4af391
feat(configs): add UNetViT3D train and predict configs for iPSC confocal
Apr 20, 2026
77a7063
Add UNetViT3D mito predict benchmark config
Apr 20, 2026
f29be3f
fix(tools): set umask 0002 so benchmark run outputs are group-writable
alxndrkalinin Apr 20, 2026
0618acd
refactor(cli): let config read errors propagate in _maybe_compose_config
alxndrkalinin Apr 20, 2026
a6d2576
feat(configs): add membrane predict config and switch to shared runti…
Apr 20, 2026
abe35fa
style(engine): fix pre-existing ruff E741 and E501 violations
alxndrkalinin Apr 20, 2026
fc3cf5f
feat(engine): add encoder_only FCMAE load to DynacellUNet
alxndrkalinin Apr 20, 2026
53385bb
feat(configs): add FCMAE-family benchmark pair on ER/SEC61B
alxndrkalinin Apr 20, 2026
ee86d29
test(engine,configs): cover encoder_only load + scratch≡pretrained in…
alxndrkalinin Apr 20, 2026
c954bc6
fix(evaluation): declare feature_extractor schema in eval.yaml
alxndrkalinin Apr 20, 2026
1c72f2f
feat(evaluation): add Hydra config groups for target/predict_set/feat…
alxndrkalinin Apr 20, 2026
80d9465
feat(configs): bump fcmae_vscyto3d fit overlay lr to 0.0004
alxndrkalinin Apr 20, 2026
78231ed
feat(dynacell): add eval_gpu extra with cupy-cuda12x + cucim-cu12
alxndrkalinin Apr 20, 2026
b409c0d
docs(evaluation): drop redundant cell_segmentation_path override in e…
alxndrkalinin Apr 20, 2026
4dea81a
fix(evaluation): fail loud on missing channels, skip empty metric files
alxndrkalinin Apr 20, 2026
817bd6c
feat(evaluation): add benchmark eval leaves mirroring predict tree
alxndrkalinin Apr 20, 2026
4f6a6e2
docs(evaluation): add benchmark row to Config groups table
alxndrkalinin Apr 20, 2026
85e31f6
fix(dynacell): run 4-GPU DDP with ntasks=4, not ntasks=1
alxndrkalinin Apr 21, 2026
b74a534
fix(configs): use ddp_find_unused_parameters_true for FCMAE leaves
alxndrkalinin Apr 21, 2026
d1d02fd
refactor(evaluation): move HPC-bound eval configs out of src/
alxndrkalinin Apr 21, 2026
e16be57
feat(dynacell): enforce ntasks == gpus == nodes × devices at submit
alxndrkalinin Apr 21, 2026
089c6e9
refactor(__main__): locate external configs via pyproject.toml marker
alxndrkalinin Apr 21, 2026
5974a5c
feat(configs): add FCMAE scratch/pretrained pair for mito/TOMM20
alxndrkalinin Apr 21, 2026
8d3ca52
docs(configs): redefine benchmark config schema for unified layout
alxndrkalinin Apr 21, 2026
587b1b4
refactor(configs): unify benchmark tree with train+predict+eval per cell
alxndrkalinin Apr 21, 2026
29cd698
fix(configs): update train/predict leaf base: paths after shared/mode…
alxndrkalinin Apr 21, 2026
5224d8d
refactor(__main__,tests): apply simplify review cleanup
alxndrkalinin Apr 21, 2026
2ac99fa
docs(eval): refresh README for post-reorg config layout
alxndrkalinin Apr 21, 2026
3cf7dd2
refactor(configs): move shared/ + leaf/ under hidden _internal/ root
alxndrkalinin Apr 21, 2026
23d924d
fix(engine): honor predict_overlap and multi-channel sliding windows
alxndrkalinin Apr 21, 2026
b3ab22d
fix(__main__): keep hydra.searchpath adjacent to positional overrides
alxndrkalinin Apr 21, 2026
5178636
docs(eval): point README intro at _internal/ for HPC-bound groups
alxndrkalinin Apr 21, 2026
94ed736
docs(configs): remove stale benchmark schema doc
alxndrkalinin Apr 21, 2026
26c41d3
fix(dynacell): emit --ntasks-per-node for SLURM/Lightning compat
alxndrkalinin Apr 21, 2026
18bf2ef
feat(eval): share DINOv3 + other HF artifacts across team via HF_HOME
alxndrkalinin Apr 21, 2026
010a9e2
feat(eval): pin canonical feature-extractor and gt_cache_dir defaults
alxndrkalinin Apr 21, 2026
8d3af13
refactor(__main__,tests): apply simplify review cleanup
alxndrkalinin Apr 21, 2026
f11abba
fix(eval): shared HF cache uses HF_HUB_CACHE not HF_HOME
alxndrkalinin Apr 21, 2026
f0d8d9b
fix(configs): halve FCMAE dataloader workers to stay under 512G cgroup
alxndrkalinin Apr 21, 2026
14f59f1
refactor(configs): group benchmark leaves by train set under <org>/<m…
alxndrkalinin Apr 21, 2026
bfc6189
fix(evaluation): independent min-max norm for metrics, fix cache cont…
Apr 21, 2026
38d47b3
feat(dynacell): manifest-driven dataset_ref resolver for benchmark le…
alxndrkalinin Apr 21, 2026
032e424
docs(dynacell): roadmap + spec for dataset_ref resolver staged rollout
alxndrkalinin Apr 21, 2026
4bb9f09
refactor(dynacell): strict dataset_ref collision + shared manifest-ro…
alxndrkalinin Apr 21, 2026
11836c8
test(dynacell): add nucleus and membrane fixture manifest targets
alxndrkalinin Apr 21, 2026
326b2d0
refactor(dynacell): migrate mito/nucleus/membrane targets to dataset_ref
alxndrkalinin Apr 21, 2026
6273439
test(dynacell): consolidate dataset_ref resolver tests across migrate…
alxndrkalinin Apr 21, 2026
8924ab2
feat(dynacell): add Hydra-side dataset_ref hook module
alxndrkalinin Apr 22, 2026
f5a6e56
refactor(dynacell): wire Hydra dataset_ref hook; migrate eval configs
alxndrkalinin Apr 22, 2026
a984384
refactor(dynacell): dedup dataset_ref hooks and test setup
alxndrkalinin Apr 22, 2026
46be278
feat(dynacell): add evaluation scripts and CUDA envrc, ignore checkpo…
Apr 22, 2026
3a27c45
chore(dynacell): bump 4gpu mem 512G -> 1024G after two OOM deaths
alxndrkalinin Apr 23, 2026
e9c4cd7
fix(viscy-data): let BatchedConcatDataset tolerate single-sample chil…
alxndrkalinin Apr 23, 2026
8c21ca6
refactor(viscy-data): drop :class: markup, assert lossless batch grou…
alxndrkalinin Apr 23, 2026
4bc2e53
feat(viscy-data): attach ShardedDistributedSampler in BatchedConcatDa…
alxndrkalinin Apr 23, 2026
e01089d
feat(viscy-utils): add configure_adamw_scheduler helper
alxndrkalinin Apr 23, 2026
d296c7f
refactor(dynacell): use shared optimizer helper; expose warmup knobs
alxndrkalinin Apr 23, 2026
ef545c0
refactor(cytoland): adopt shared optimizer helper and ckpt_path hpara…
alxndrkalinin Apr 23, 2026
a519ac9
chore(cytoland): default vscyto3d finetune to bf16-mixed
alxndrkalinin Apr 23, 2026
5950576
refactor(dynacell): split fit overlays into model+trainer vs HCS data
alxndrkalinin Apr 23, 2026
c603687
docs(dynacell): add job submission reliability plan
alxndrkalinin Apr 24, 2026
b157daa
feat(dynacell): add --exclude as optional SBATCH directive
alxndrkalinin Apr 24, 2026
b9ebf7b
feat(dynacell): add NCCL preflight smoke test to sbatch template
alxndrkalinin Apr 24, 2026
96c5787
feat(viscy-data): add include/exclude_fov_names to HCSDataModule
alxndrkalinin Apr 24, 2026
aa68329
feat(cytoland): add A549 infection VSCyto3D warm-start finetune configs
alxndrkalinin Apr 24, 2026
ae7296b
refactor(dynacell): /simplify cleanup of preflight + exclude impl
alxndrkalinin Apr 24, 2026
9149ed7
perf(viscy-data): store FOV filters as sets for O(1) lookup
alxndrkalinin Apr 24, 2026
ab50b7c
fix(viscy-data): resolve per-timepoint norm_meta before transforms
alxndrkalinin Apr 24, 2026
9a730f8
feat(dynacell): add FNet3D paper predict configs for ipsc_confocal
alxndrkalinin Apr 24, 2026
8c7bbae
feat(cytoland): align A549 infection finetune to dynacell FCMAE recipe
alxndrkalinin Apr 24, 2026
9654e2b
feat(dynacell): add first Stage 7 joint train leaf (celldiff, ER/SEC61B)
alxndrkalinin Apr 24, 2026
3072c3e
feat(viscy-data): support include/exclude_fov_names with mmap_preload…
alxndrkalinin Apr 24, 2026
26f1a7b
docs(dynacell): refresh virtual_staining README for fit-split + joint…
alxndrkalinin Apr 24, 2026
48797b6
fix(dynacell): clarify error message for missing seg_model
alxndrkalinin Apr 25, 2026
b65037e
fix(dynacell): guard sbatch cleanup against unset SLURM_JOB_ID
alxndrkalinin Apr 25, 2026
9c1ab9c
fix(dynacell): single-source dinov3 model name via Hydra group
alxndrkalinin Apr 25, 2026
e06a71b
fix(dynacell): drop NaN bars from cross-model metric figure
alxndrkalinin Apr 25, 2026
5a2c6fb
refactor(dynacell): use Field(default_factory=list) for Pydantic list…
alxndrkalinin Apr 25, 2026
b422bc5
fix(viscy-utils): strip top-level _-prefixed keys from composed config
alxndrkalinin Apr 25, 2026
437eb11
docs(dynacell): document anchor convention; pin no-anchor-leak invariant
alxndrkalinin Apr 25, 2026
3434b7e
chore(dynacell): declare wandb optional extra; document the runtime gap
alxndrkalinin Apr 25, 2026
4d399d5
feat(dynacell): add canonical joint smoke leaf for celldiff/joint_ips…
alxndrkalinin Apr 25, 2026
453b09d
feat(dynacell): add hardware_h200_single_smoke launcher profile
alxndrkalinin Apr 25, 2026
8ff7abf
refactor(dynacell): /simplify smoke profile via wall-only overlay
alxndrkalinin Apr 25, 2026
3e3909a
feat(dynacell): disable logger at the smoke leaf level
alxndrkalinin Apr 25, 2026
00a2730
fix(viscy-data): drop non-tensor metadata in BatchedConcatDataModule …
alxndrkalinin Apr 25, 2026
5b2327f
fix(dynacell): cut smoke leaf batch_size to 1 to fit a single H200
alxndrkalinin Apr 26, 2026
234819a
feat(dynacell): add 4-GPU DDP smoke leaf for joint celldiff SEC61B
alxndrkalinin Apr 26, 2026
48f4878
perf(viscy-utils): bf16-precision SSIM helper for Hopper FCMAE traini…
alxndrkalinin Apr 26, 2026
0b04b24
fix(viscy-data): drop use_thread_workers to fix DDP deadlock (#413)
alxndrkalinin Apr 26, 2026
2e0ee29
feat(dynacell): nucleus + membrane FCMAE_VSCyto3D scratch + pretraine…
alxndrkalinin Apr 26, 2026
a198b5e
fix(dynacell): override Structure aug-keys for nucleus + membrane FCM…
alxndrkalinin Apr 26, 2026
e814147
feat(dynacell): add eval script for FNet3D paper predictions on iPSC …
Apr 27, 2026
8f92711
feat(dynacell): unblock joint training + add A549 cross-eval (#415)
alxndrkalinin Apr 27, 2026
6c75284
docs(dynacell): final findings + 8-job FCMAE benchmark, open items
alxndrkalinin Apr 26, 2026
4bb4fd9
docs(dynacell): clean up resolved planning docs; refresh A549 roadmap
alxndrkalinin Apr 27, 2026
9af8bdf
chore(dynacell): rename ER+Mito FCMAE pretrained outputs to _ws8500
alxndrkalinin Apr 27, 2026
40445fe
chore(dynacell): pin compute-job CWD to repo_root in sbatch template
alxndrkalinin Apr 27, 2026
b196723
docs(dynacell): refresh A549 roadmap with 2026-04-24 status
alxndrkalinin Apr 27, 2026
bd74574
fix(dynacell): repoint mito A549 cross-eval to 2024_11_21 plate
alxndrkalinin Apr 27, 2026
6516c1a
feat(dynacell): add fnet3d_paper Stage 6 A549 predict leaves
alxndrkalinin Apr 27, 2026
b41e7df
feat(dynacell): add fcmae_vscyto3d Stage 6 A549 predict scaffolding
alxndrkalinin Apr 27, 2026
df85d93
feat(dynacell): add FNet3D ER joint single-GPU smoke leaf
alxndrkalinin Apr 28, 2026
ad1df84
perf(cytoland): enable mmap+persistent+bf16 for A549 infected finetune
alxndrkalinin Apr 28, 2026
41d39ba
chore(cytoland): drop h200 constraint on A549 infected sbatch
alxndrkalinin Apr 28, 2026
2395e1d
feat(dynacell): bundle manifest registry as first-class package data
alxndrkalinin Apr 28, 2026
ef0e7c8
chore(cytoland): require >=80 GB VRAM for A549 infected sbatch
alxndrkalinin Apr 28, 2026
76940d1
docs(dynacell): note manifest registry drift policy in README
alxndrkalinin Apr 28, 2026
31d72ee
feat(dynacell): submit_benchmark_job.py supports optional --dependenc…
alxndrkalinin Apr 28, 2026
cbcf7bd
feat(dynacell): bundle 4 missing A549 manifests + predict_set fragments
alxndrkalinin Apr 28, 2026
3098cfe
feat(dynacell): per-plate A549 predict leaves for ER + MITO across 5 …
alxndrkalinin Apr 28, 2026
2f4dff1
feat(dynacell): submit_benchmark_batch.py — chain N predict leaves in…
alxndrkalinin Apr 28, 2026
694d744
feat(dynacell): compute FID/KID at dataset level across all cells
Apr 28, 2026
ffc15db
feat(dynacell): add per-model eval scripts in model subfolders
Apr 28, 2026
b027421
fix(dynacell): use uv run consistently and fix pred path in eval scripts
Apr 28, 2026
b4c66aa
refactor(dynacell): consolidate eval scripts into model subfolders
Apr 28, 2026
f051def
chore(manifests): point a549-mantis stores at .ozx
alxndrkalinin Apr 28, 2026
a327d0d
feat(dynacell): add --override + --overwrite to submit_benchmark_batc…
alxndrkalinin Apr 29, 2026
6b8ccc9
feat(dynacell): add --overwrite alias to submit_benchmark_job.py for …
alxndrkalinin Apr 29, 2026
b2a17fa
feat(dynacell): predict_local_a549.sh — local-GPU per-plate predict b…
alxndrkalinin Apr 29, 2026
ab4ee83
fix(dynacell): predict_local_a549.sh dying after first plate
alxndrkalinin Apr 29, 2026
b6988c4
fix(dynacell): predict_local_a549.sh - probe unbuffer + catch child f…
alxndrkalinin Apr 29, 2026
ac6c159
feat(dynacell): tomm20 fcmae scratch predict leaves pinned to best ckpt
alxndrkalinin Apr 30, 2026
5644906
feat(dynacell): matrix-fill iPSC predict leaves for remaining 7 FCMAE…
alxndrkalinin Apr 30, 2026
b0c0f24
chore(dynacell): drop legacy per-date a549_mantis registry + leaves
alxndrkalinin Apr 30, 2026
503774b
feat(dynacell): per-condition a549_mantis registry + predict/eval matrix
alxndrkalinin Apr 30, 2026
e98b407
feat(dynacell): a549 evaluation runner scripts for fnet3d + unetvit3d
alxndrkalinin Apr 30, 2026
6c92b9f
chore(gitignore): ignore plot_related/ directory
Apr 30, 2026
e5e907e
feat(dynacell): predict_local_ipsc.sh — local ipsc_confocal predict r…
alxndrkalinin Apr 30, 2026
2441b1c
feat(dynacell): ER joint train uses pooled a549 SEC61B_all store
alxndrkalinin Apr 30, 2026
65f3ae5
chore(dynacell): drop legacy predict__a549_mantis leaves for membrane…
alxndrkalinin Apr 30, 2026
d985a76
feat(dynacell): nucleus fcmae pretrained predict leaves pinned to bes…
alxndrkalinin Apr 30, 2026
40b6dc9
feat(dynacell): sec61b fcmae scratch predict leaves pinned to best ckpt
alxndrkalinin Apr 30, 2026
a7bc0cf
feat(dynacell): tomm20 fcmae pretrained predict leaves pinned to best…
alxndrkalinin Apr 30, 2026
0d72bbf
chore(security): warn against upgrading past lightning 2.6.1
alxndrkalinin Apr 30, 2026
eef1459
feat(dynacell): joint ipsc_confocal+a549_mantis train leaves for 7 cells
alxndrkalinin Apr 30, 2026
4c53eaf
fix(security): pin lightning to direct CDN URL after PyPI quarantine
alxndrkalinin Apr 30, 2026
abba46f
fix(viscy-utils): HCSPredictionWriter idempotent on multi-timepoint i…
alxndrkalinin Apr 30, 2026
370beb7
feat(dynacell): joint train leaves for fcmae/fnet3d/unext2 (13 cells)
alxndrkalinin Apr 30, 2026
0a1b764
fix(dynacell): fnet3d_paper joint train leaves are single-GPU like iPSC
alxndrkalinin Apr 30, 2026
72d4183
fix(dynacell): align joint train channel typing with iPSC (str not list)
alxndrkalinin Apr 30, 2026
afc997d
fix(dynacell): celldiff + unetvit3d joint train leaves are single-H20…
alxndrkalinin Apr 30, 2026
2d288f3
fix(dynacell): mem=512G for fnet joint nucleus + membrane (matches iPSC)
alxndrkalinin Apr 30, 2026
25af720
fix(dynacell): bump joint train mem to 512G (preloads two stores)
alxndrkalinin Apr 30, 2026
e41c795
feat(dynacell): a549-only train leaves for all 21 organelle×model cells
alxndrkalinin Apr 30, 2026
e7f2af6
feat(dynacell): switch celldiff a549_mantis predict configs to iterat…
May 1, 2026
a7b7403
feat(dynacell): add unext2 eval runner script
May 1, 2026
8cc7bf5
fix(dynacell): bump joint fnet3d nucl/memb mem 512G->1024G (OOM)
alxndrkalinin May 1, 2026
6415f7d
fix(dynacell): bump a549 nucl fnet3d mem 512G->1024G (OOM)
alxndrkalinin May 1, 2026
b48a13c
chore(cytoland): repoint A549 infected configs from Lustre to VAST
alxndrkalinin May 1, 2026
6ec0d6f
fix(viscy-data): mmap_preload reads via BasicIndexer (~6x less RAM)
alxndrkalinin May 1, 2026
2a513b4
revert(dynacell): drop fnet3d mem overrides after mmap_preload fix
alxndrkalinin May 1, 2026
8c31d20
feat(dynacell): wire iPSC FCMAE membrane best ckpt into predict leaves
alxndrkalinin May 1, 2026
d97c23b
fix(dynacell): set PYTORCH_ALLOC_CONF=expandable_segments:True
alxndrkalinin May 1, 2026
7a884b5
fix(dynacell): joint fnet3d batch_size 48->6 (CUDA OOM)
alxndrkalinin May 1, 2026
c793da1
fix(dynacell): joint fcmae batch_size 32->8 (CUDA OOM risk)
alxndrkalinin May 1, 2026
7832af3
fix(dynacell): hardware_4gpu profile now 512G + H100/H200-only
alxndrkalinin May 1, 2026
513d3e6
chore(dynacell): point SEC61B fcmae_pretrained predict configs at ep 123
alxndrkalinin May 1, 2026
e085ee3
chore(dynacell): point Memb fcmae_scratch predict configs at ep 136
alxndrkalinin May 1, 2026
52b53d5
fix(dynacell): consistent 512G mem across 8 fnet3d a549/joint leaves
alxndrkalinin May 1, 2026
4602496
feat(dynacell): submit script for 8 fnet3d a549/joint training jobs
alxndrkalinin May 1, 2026
21df26e
chore(dynacell): point Nucl fcmae_scratch predict configs at ep 80
alxndrkalinin May 1, 2026
6e935e4
fix(dynacell): predict_local_*.sh fail fast on placeholder ckpt_path
alxndrkalinin May 1, 2026
4bbcee8
feat(dynacell): add VSCyto3D eval runner script
May 1, 2026
b48fc13
fix(dynacell): joint fcmae batch_size 32->8 for ER + MITO
alxndrkalinin May 1, 2026
f80466a
fix(dynacell): correct A549 UNetViT3D eval script — 4 organelles × 3 …
May 2, 2026
ab0e193
feat(dynacell): add A549 CellDiff eval script — 3 variants × 4 organe…
May 2, 2026
10e5c16
fix(viscy-data): support heterogeneous T per FOV in mmap_preload
alxndrkalinin May 2, 2026
dcfedfd
refactor(viscy-data): compute mmap T offsets once per setup_fit
alxndrkalinin May 2, 2026
848f89b
feat(dynacell): A549 eval scripts for all 5 models — flat mantis_v1 l…
May 3, 2026
5a2a346
fix(viscy-data): skip bs%num_samples check for BatchedConcat children
alxndrkalinin May 4, 2026
d407687
refactor(viscy-data): tighten joint-divisibility test + comment
alxndrkalinin May 4, 2026
4951fc0
feat(dynacell): a549-trained nucleus fnet3d_paper predict configs
alxndrkalinin May 4, 2026
16d5482
docs: explain joint vs single-set training batch semantics
alxndrkalinin May 4, 2026
397bff2
refactor(dynacell): unify predict_local script across train sets
alxndrkalinin May 4, 2026
dd80af3
docs(dynacell): add model name convention reference
alxndrkalinin May 4, 2026
889c1a5
test(dynacell): fcmae er joint smoke configs
alxndrkalinin May 4, 2026
f6af4dd
chore(dynacell): handoff script for a549-only fcmae resubmits
alxndrkalinin May 4, 2026
35b4f04
feat(dynacell): a549-trained membrane fcmae_scratch predict configs
alxndrkalinin May 4, 2026
1c393bf
feat(dynacell): a549-trained membrane fnet3d_paper predict configs
alxndrkalinin May 4, 2026
a6331cb
feat(dynacell): a549-trained nucleus fcmae_scratch predict configs
alxndrkalinin May 5, 2026
e5abfe2
refactor(dynacell): unify predict_batch script across train/test sets
alxndrkalinin May 5, 2026
a7a2ebd
feat(dynacell): save single-cell embeddings with FOV/timepoint metadata
May 5, 2026
a821459
feat(dynacell): add CellDiff A549 mantis predict configs; reduce batc…
May 5, 2026
2915982
feat(dynacell): a549-trained nucleus fcmae_pretrained predict configs
alxndrkalinin May 5, 2026
ae10bf7
docs(dynacell): document prediction zarr naming convention
alxndrkalinin May 5, 2026
38e08e0
feat(dynacell): a549-trained ER fcmae_pretrained predict configs
alxndrkalinin May 5, 2026
8981a08
train infected 4gpu
edyoshikun May 5, 2026
a5d1571
feat(dynacell): joint-trained nucleus fcmae_scratch predict configs
alxndrkalinin May 6, 2026
4cc95ea
feat(dynacell): joint-trained membrane fcmae_pretrained predict configs
alxndrkalinin May 6, 2026
7c663e7
feat(dynacell): joint-trained membrane fcmae_scratch predict configs
alxndrkalinin May 6, 2026
b64d1d6
feat(dynacell): a549-trained mito fcmae_scratch predict configs
alxndrkalinin May 6, 2026
e35244c
feat(dynacell): a549-trained ER fcmae_scratch predict configs
alxndrkalinin May 6, 2026
ad65b21
feat(dynacell): joint-trained nucleus fnet3d predict configs
alxndrkalinin May 6, 2026
ad45beb
feat(dynacell): joint-trained membrane fnet3d predict configs
alxndrkalinin May 6, 2026
142be0a
feat(dynacell): a549-trained mito fnet3d predict configs
alxndrkalinin May 6, 2026
68f4d06
feat(dynacell): add joint-trained eval scripts and predict configs fo…
May 6, 2026
78544dd
feat(dynacell): joint-trained nucleus fnet3d eval scripts (full metrics)
alxndrkalinin May 7, 2026
ef58867
feat(dynacell): joint-trained membrane unext2 eval scripts (full metr…
alxndrkalinin May 7, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
3 changes: 3 additions & 0 deletions .envrc
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
export CUDA_PATH=/hpc/apps/cuda/12.8.0_570.86.10
export PATH=$CUDA_PATH/bin:$PATH
export LD_LIBRARY_PATH=$CUDA_PATH/lib64:${LD_LIBRARY_PATH:-}
Comment thread
alxndrkalinin marked this conversation as resolved.
25 changes: 24 additions & 1 deletion .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -118,10 +118,33 @@ jobs:
run: uv run --frozen pytest
working-directory: applications/${{ matrix.application }}

test-dynacell-configs:
name: Test dynacell benchmark configs (Python 3.13, ubuntu-latest)
runs-on: ubuntu-latest

steps:
- name: Checkout repository
uses: actions/checkout@v5

- name: Set up uv with Python 3.13
uses: astral-sh/setup-uv@v7
with:
python-version: "3.13"
enable-cache: true
cache-suffix: ubuntu-latest-3.13

- name: Install minimal dynacell (base deps + test group)
run: uv sync --frozen --group test
working-directory: applications/dynacell

- name: Run benchmark-schema + submit-tool tests
run: uv run --frozen pytest tests/test_benchmark_config_composition.py tests/test_submit_benchmark_job.py -v
working-directory: applications/dynacell

check:
name: All tests pass
if: always()
needs: [test, test-data, test-data-extras, test-applications]
needs: [test, test-data, test-data-extras, test-applications, test-dynacell-configs]
runs-on: ubuntu-latest
steps:
- name: Verify all test jobs succeeded
Expand Down
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -66,3 +66,7 @@ slurm*.out
lightning_logs/

# NOTE: uv.lock is NOT ignored - it should be tracked for reproducibility

checkpoints/

plot_related/
217 changes: 128 additions & 89 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,10 @@
# CLAUDE.md
# VisCy — Claude Code Reference

Project-specific instructions for Claude Code sessions in this repository.
## Project

## Git Workflow
- **NEVER** use `git commit --amend` or `git push --force` / `--force-with-lease` unless the user explicitly requests it. Always create NEW commits.
- ALWAYS use atomic commits: one logical change per commit. Never bundle unrelated changes.
- Never use `git add -A` or `git add .`. Always stage specific files by name.
- Always pull before pushing. If push is rejected, pull and retry — never force-push.

## Repository Structure
VisCy is a **uv workspace monorepo** for virtual staining and computational microscopy. Sub-packages live under `packages/`.

VisCy is a **uv workspace monorepo**. Sub-packages live under `packages/`:
## Repo Layout

```
pyproject.toml # Root config (ruff, pytest, uv workspace)
Expand All @@ -28,51 +22,115 @@ applications/ # Self-contained research applications
- **Applications must not import from each other.** If two applications need the same logic, move it to an existing package or create a new one.
- Applications are consumers of packages — the dependency graph always flows `applications/ → packages/`, never sideways.

## Code Style
---

## Development

### Environment Setup

## Testing
Use `uv` package manager. Run commands with `uv run <command>`. Edit `pyproject.toml` to modify dependencies and sync to update `uv.lock`.

```sh
uv run pytest # all tests
uv run pytest packages/viscy-data/ # single package (data)
uv run pytest packages/viscy-models/ # single package (models)
uv venv -p 3.13
uv sync --all-packages --all-extras
```

## Common Commands
If `uv` is not installed:
```sh
curl -LsSf https://astral.sh/uv/install.sh | sh
```

On HPC, symlink the uv cache out of your home directory first:
```sh
uvx ruff check packages/ # lint
mkdir -p /hpc/mydata/firstname.lastname/.cache/uv && ln -s /hpc/mydata/firstname.lastname/.cache/uv ~/.cache/uv
```

For full setup instructions (installing uv, creating a venv, syncing dependencies), see [CONTRIBUTING.md](./CONTRIBUTING.md).

### SLURM scripts for Lightning DDP jobs

When hand-writing `.slurm` scripts that launch Lightning via `srun`, always use `--ntasks-per-node=N` (not `--ntasks=N`). Lightning's `SLURMEnvironment` validates `SLURM_NTASKS_PER_NODE` at trainer init and raises `RuntimeError: You set --ntasks=N in your SLURM bash script, but this variable is not supported. HINT: Use --ntasks-per-node=N instead.` — the job then dies seconds into the allocation.

Invariant: `#SBATCH --ntasks-per-node=N` must equal `trainer.devices` in the YAML config and `#SBATCH --gpus=N` (single-node) or `#SBATCH --gpus-per-node=N` (multi-node).

The dynacell launcher (`applications/dynacell/tools/submit_benchmark_job.py`) already emits `--ntasks-per-node` correctly; this note is for hand-written scripts (e.g., `applications/cytoland/examples/configs/*/run_*.slurm`).

### Joint vs single-set training batch semantics

`HCSDataModule` and `BatchedConcatDataModule` produce the same number of GPU samples per training step — but the YAML `batch_size` value that gets there is **different by a factor of `num_samples`**. Easy to misread either by skimming.

| DataModule | `train_dataloader` divides by `num_samples`? | Samples per step |
|---|---|---|
| `HCSDataModule` (single-set) | yes (`hcs.py` `train_dataloader`) | `batch_size` |
| `ConcatDataModule` (parent class) | yes (`combined.py` `train_dataloader`) | `batch_size` |
| `BatchedConcatDataModule` (joint) | **no** (`combined.py` overrides; uses `batch_size` as-is) | `batch_size * num_samples` |

To match the same effective per-step samples between a single-set and a joint config, **set `joint.batch_size = single_set.batch_size / num_samples`**.

Examples (verified against the `applications/dynacell/configs/benchmarks/virtual_staining/_internal/shared/model/data_overlays/` overlays + their joint leaves):

- FCMAE (`fcmae_vscyto3d_*`): single-set `batch_size: 32, num_samples: 4` → joint `batch_size: 8, num_samples: 4` → both yield **32 samples/step**.
- FNet3D (`fnet3d_paper`): single-set `batch_size: 48, num_samples: 8` → joint `batch_size: 6, num_samples: 8` → both yield **48 samples/step**.

`HCSDataModule._train_transform` enforces `batch_size % num_samples == 0` for single-set use because `train_dataloader` would otherwise round down silently. The check is suppressed for `BatchedConcatDataModule` children via the `_is_batched_concat_child` flag set in the wrapper's `setup()` — joint configs are free to pick any `(batch_size, num_samples)` pair as long as the product is the desired sample count. **Do not** "fix" a joint config by raising `batch_size` to satisfy the divisibility rule; it would multiply effective samples by `num_samples`.

When in doubt, read both `train_dataloader` overrides directly — they are short. Don't infer from comments alone.

### Common Commands

```sh
uvx ruff check packages/ # lint
uvx ruff check --fix packages/ # lint + auto-fix
uvx ruff format packages/ # format
uv run pytest # all tests
```

## Code Style
### Testing

```sh
uv run pytest # all tests
uv run pytest packages/viscy-data/ # single package (data)
uv run pytest packages/viscy-models/ # single package (models)
```

Prefer `{file}_test.py` in the same directory as `{file}.py`, unless there are import issues, in which case use `tests/`.

---

## Project Conventions

- Ruff config is centralized in the root `pyproject.toml` only. Sub-packages must NOT have their own `[tool.ruff.*]` sections. Ruff does not inherit config — any `[tool.ruff.*]` in a sub-package silently overrides the entire root config (including `lint.select`, `per-file-ignores`, etc.).
- Run `uvx prek run --files {files_you_edited}` (unless the change was simple) and fix typing and linting errors. Use `# type: ignore` as needed. The precommit will give you type errors which is useful — especially to know if you have incorrect code — but for many minor changes it's better to do this after testing. Use a subagent to apply complex fixes.

---

## Engineering Standards

### Git Workflow

- **NEVER** use `git commit --amend` or `git push --force` / `--force-with-lease` unless the user explicitly requests it. Always create NEW commits.
- ALWAYS use atomic commits: one logical change per commit. Never bundle unrelated changes.
- Never use `git add -A` or `git add .`. Always stage specific files by name.
- Always pull before pushing. If push is rejected, pull and retry — never force-push.

### Code Style

### General
- **Ruff config is centralized in the root `pyproject.toml` only.**
Sub-packages must NOT have their own `[tool.ruff.*]` sections.
Ruff does not inherit config — any `[tool.ruff.*]` in a sub-package
silently overrides the entire root config (including `lint.select`,
`per-file-ignores`, etc.).
- Docstrings use **numpy style** (`convention = "numpy"`).
- Lint rules: `D, E, F, I, NPY, PD, W`.
- `D` rules are ignored in `**/tests/**` and notebooks.
- Format: double quotes, spaces, 120 char line length.
- Prefer {file}_test.py in the same directory as {file}.py, unless there are import issues, in which case use tests/...
- Run `uvx prek run --files {files_you_editted}` (unless the change was simple) and fix typing and linting errors, you make `# type: ignore` as needed.
The precommit will give you type errors which is nice - especially to know if you have incorrect code - but for many minor changes it's better to do this after testing.
Use a subagent to apply complex fixes.
- Use a subagent to run tests and complex bash commands, especially that which you think will return complex output.
- Use a subagent to run tests and complex bash commands, especially those expected to return complex output.

### Avoid Backwards Compatibility
In most cases it is incorrect to maintain backwards compatibility with a previous pipeline. This is a research codebase - changes are expected and encouraged. Keeping backwards compatibility risks MORE bugs, since someone can unknowingly run old code.
#### Avoid Backwards Compatibility

In most cases it is incorrect to maintain backwards compatibility with a previous pipeline. This is a research codebase — changes are expected and encouraged. Keeping backwards compatibility risks MORE bugs, since someone can unknowingly run old code.

If you believe it is important to maintain backwards compatibility, explicitly ask the user if you should do so during the planning stage. If the user says no, then do not maintain backwards compatibility.

Delete and remove old code that is not used.

### Use Context Managers for Resources
#### Use Context Managers for Resources

Always use context managers (`with` statements) when opening external resources like zarr stores, files, or database connections. Never assign them to a variable without a context manager — this leaks file handles and locks.

```python
Expand All @@ -84,95 +142,76 @@ with open_ome_zarr(path, mode="r") as plate:
plate = open_ome_zarr(path, mode="r")
```

### Prefer Raising Errors
In general, prefer raising errors instead of silently catching them. Errors are good and warn us of issues in the script. For example, prefer `value = my_dictionary['key']` over `value = my_dictionary.get('key')` since the former will raise a `KeyError` to signal that the underlying data is not behaving as expected.
#### Prefer Raising Errors

Prefer raising errors instead of silently catching them. Errors are good and warn us of issues. For example, prefer `value = my_dictionary['key']` over `value = my_dictionary.get('key')` since the former will raise a `KeyError` to signal that the underlying data is not behaving as expected.

Only catch errors when there is a good reason to do so: for example, catching HTTP errors in order to retry a request.

If you find yourself writing an if statement, fallback, or except statement designed to avoid errors, ask yourself if it would be better to raise the error as a signal to the user.

#### Use Real Integration Tests

### Use Real Integration Tests
Tests should directly *import* the actual code we are trying to test. For example, if you are trying to test `my_function` on some sample data, your test should directly import `my_function` and run it on the sample data. AVOID testing "key behavior" or components of the pipeline, since this can miss bugs.
Tests should directly *import* the actual code we are trying to test. For example, if you are trying to test `my_function` on some sample data, your test should directly import `my_function` and run it on the sample data. Avoid testing "key behavior" or components in isolation when an integration test would catch more bugs.

Ask yourself if your test is actually covering the true function.

### Imports
- Import at the top of the file. Don't use inline imports without strong reason.
- Use absolute imports (`from projects.my_directory.my_file`) instead of relative.
- Do not modify `sys.path` for imports.

## Development Environment

### Environment
Use `uv` package manager. Run commands with `uv run <command>`. Edit `pyproject.toml` to modify dependencies and sync to update `uv.lock`

For full setup instructions (installing uv, creating a venv, syncing dependencies), see [CONTRIBUTING.md](./CONTRIBUTING.md).

Quick start:
```sh
uv venv -p 3.13
uv sync --all-packages --all-extras
uv run pytest
```
#### Imports

If `uv` is not installed:
```sh
curl -LsSf https://astral.sh/uv/install.sh | sh
```
- Import at the top of the file. No inline imports without strong reason.
- Use absolute imports (`from packages.my_directory.my_file`) instead of relative.
- Do not modify `sys.path` for imports.

On HPC, symlink the uv cache out of your home directory first:
```sh
mkdir -p /hpc/mydata/firstname.lastname/.cache/uv && ln -s /hpc/mydata/firstname.lastname/.cache/uv ~/.cache/uv
```
### Coding Philosophy

## Coding
#### 1. Think Before Coding

1. Think Before Coding
Don't assume. Don't hide confusion. Surface tradeoffs.

Before implementing:
- State your assumptions explicitly. If uncertain, ask.
- If multiple interpretations exist, present them — don't pick silently.
- If a simpler approach exists, say so. Push back when warranted.
- If something is unclear, stop. Name what's confusing. Ask.

#### 2. Simplicity First

State your assumptions explicitly. If uncertain, ask.
If multiple interpretations exist, present them - don't pick silently.
If a simpler approach exists, say so. Push back when warranted.
If something is unclear, stop. Name what's confusing. Ask.
2. Simplicity First
Minimum code that solves the problem. Nothing speculative.

No features beyond what was asked.
No abstractions for single-use code.
No "flexibility" or "configurability" that wasn't requested.
No error handling for impossible scenarios.
If you write 200 lines and it could be 50, rewrite it.
Ask yourself: "Would a senior engineer say this is overcomplicated?" If yes, simplify.
- No features beyond what was asked.
- No abstractions for single-use code.
- No "flexibility" or "configurability" that wasn't requested.
- No error handling for impossible scenarios.
- If you write 200 lines and it could be 50, rewrite it.
- Ask yourself: "Would a senior engineer say this is overcomplicated?" If yes, simplify.

#### 3. Surgical Changes

3. Surgical Changes
Touch only what you must. Clean up only your own mess.

When editing existing code:
- Don't "improve" adjacent code, comments, or formatting.
- Don't refactor things that aren't broken.
- Match existing style, even if you'd do it differently.
- If you notice unrelated dead code, mention it — don't delete it.

Don't "improve" adjacent code, comments, or formatting.
Don't refactor things that aren't broken.
Match existing style, even if you'd do it differently.
If you notice unrelated dead code, mention it - don't delete it.
When your changes create orphans:
- Remove imports/variables/functions that YOUR changes made unused.
- Don't remove pre-existing dead code unless asked.

The test: every changed line should trace directly to the user's request.

Remove imports/variables/functions that YOUR changes made unused.
Don't remove pre-existing dead code unless asked.
The test: Every changed line should trace directly to the user's request.
#### 4. Goal-Driven Execution

4. Goal-Driven Execution
Define success criteria. Loop until verified.

Transform tasks into verifiable goals:
- "Add validation" → "Write tests for invalid inputs, then make them pass"
- "Fix the bug" → "Write a test that reproduces it, then make it pass"
- "Refactor X" → "Ensure tests pass before and after"

"Add validation" → "Write tests for invalid inputs, then make them pass"
"Fix the bug" → "Write a test that reproduces it, then make it pass"
"Refactor X" → "Ensure tests pass before and after"
For multi-step tasks, state a brief plan:

1. [Step] → verify: [check]
2. [Step] → verify: [check]
3. [Step] → verify: [check]

Strong success criteria let you loop independently. Weak criteria ("make it work") require constant clarification.
3 changes: 1 addition & 2 deletions applications/airtable/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -8,14 +8,13 @@ description = "Interface to the Computational Imaging Airtable database"
keywords = [ "airtable", "metadata", "microscopy", "zarr" ]
license = "BSD-3-Clause"
authors = [ { name = "Biohub", email = "compmicro@czbiohub.org" } ]
requires-python = ">=3.11"
requires-python = ">=3.12"
classifiers = [
"Development Status :: 3 - Alpha",
"Intended Audience :: Science/Research",
"License :: OSI Approved :: BSD License",
"Operating System :: OS Independent",
"Programming Language :: Python :: 3 :: Only",
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12",
"Programming Language :: Python :: 3.13",
"Programming Language :: Python :: 3.14",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,8 @@
# Batch related launches with:
# export VISCY_WANDB_LAUNCH=20260401-augfix-r1
base:
- ../recipes/trainer/fit_1gpu.yml
- ../recipes/trainer/fit.yml
- ../recipes/topology/single_gpu.yml
- ../recipes/data/hcs_sec61b_3d.yml
- ../recipes/models/fnet3d_z8.yml

Expand All @@ -20,9 +21,12 @@ model:
schedule: WarmupCosine

trainer:
precision: bf16-mixed
max_epochs: 100
logger:
init_args:
# Override cytoland's default project: this bridge trains on a dynacell dataset (iPSC SEC61B).
project: dynacell
name: FNet3D_iPSC_SEC61B
save_dir: /hpc/projects/comp.micro/virtual_staining/models/dynacell_cytoland/ipsc/sec61b/fnet3d
callbacks:
Expand Down
Loading