feat(simulation): physically-driven synthetic-data + teaching suite (minian.simulation) by daharoni · Pull Request #328 · miniscope/minian

daharoni · 2026-06-04T19:33:55Z

Draft / WIP. Adds minian.simulation, a physically-driven synthetic calcium-imaging data generator plus a teaching scaffold. It produces ground-truth recordings used to test the pipeline (motion correction and CNMF recovery) and serves as a hands-on teaching entry point.

What this adds

Typed Spec surface and a Layer-2 physics/scene runtime (spec.py, scene.py, steps/).
A composable step chain: tissue, cell, optics (incl. NA^2 light-collection efficiency), sensor, brain motion.
Recording/GroundTruth types and a simulate() orchestrator with spec-hash to zarr local caching.
A shared structural-metrics oracle (metrics.py) and a Cartesian spec sweep generator (sweep.py).
Test suite: unit tests per module plus whole-pipeline structural self-consistency and per-stage motion-correction / CNMF footprint-recovery demos.

Status

Steps 2 to 13 implemented and committed (linear, one commit per step).
Step 14 (the two training notebooks under minian/notebooks/training/) is in progress and intentionally left WIP.

Base and dependencies

Targets v2-integration (this is v2 work, not part of the v1.3 release on master).
Replace pymetis with k-d tree spatial partitioner #303 (pymetis to k-d tree) is already in master, so Notebook 2's full CNMF chain runs on ordinary CI.
Step 14 depends on the notebook + demo-data packaging work (issue Notebook + demo-data packaging #306, PR feat(packaging): ship notebooks in-package and fetch demo data from Zenodo #317) landing on v2-integration first; the notebooks slot into that structure and inherit its discovery-test CI, so no bespoke notebook CI is added here.

🤖 Generated with Claude Code

Introduce the minian.simulation package with its pydantic v2 spec contract: the physical interface (Acquisition holding Optics / ImageSensor / Tissue with unit conversions), the StepSpec base, 11 step specs, the static AnyStep discriminated union, and the top-level Spec with cache_key + cross-field validators (hard fails + advisory SpecWarnings). Add pydantic>=2 and numpydantic>=1.8 as core runtime deps (not an extra) so the training notebooks run offline after a plain pip install; regenerate pdm.lock. Specs are grouped by the physical thing they describe (lens props on Optics, detector geometry + noise on ImageSensor, scattering on Tissue); the sensor step keeps only the exposure scale. Physics math, Scene, executable step bodies, and numpydantic output types are deferred to later migration steps. 20 unit tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…Step 3) Derive the phenomenological quantities the simulator steps consume from the Layer-1 physical knobs, as small closed-form helpers on the model that owns each physical thing: - Optics.diffraction_sigma_um (sigma ~= 0.21*lambda/NA) - Optics.defocus_sigma_um (NA*|z-focal|, intensity-conserving) - Tissue.attenuation (Beer-Lambert exp(-z/mfp)) and scatter_sigma_um - Acquisition.cell_optics -> (sigma_px, brightness), quadrature-combining the above; the sigma_0^2/sigma_tot^2 peak drop makes defocus volume-conserving while only attenuation removes light - ImageSensor.photons_to_counts: Poisson shot + Gaussian read -> x gain -> floor -> clip, the only place fluorescence becomes integer counts Each helper has a teaching docstring (approximation, units, typical range) and a direct unit test, including the defocus integrated-intensity-conservation invariant. Scene, build() bodies, and the in_focus/detectable flags stay deferred to Steps 4-5. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…Step 4) Add the mutable counterpart to the immutable Spec: the working state the executable steps (Step 5) read and write, and the substrate single-step unit tests construct directly. Three plain dataclasses (not pydantic -- they hold xarray/Generator and are mutated in place): - Scene: acq, rng, movie, cells, truth, snapshots. Scene.zeros/Scene.ones build a (frame, height, width) xr.DataArray with the minian dim names + arange coords, in float64 working precision; the downcast to Output.store_dtype is a finalize() concern (Step 6). rng=None -> default_rng(). - Cell: per-cell record whose fields mirror the GroundTruth structural columns (spec section 8) one-for-one, all None until the producing step fills them. - GroundTruthBuilder: the per-effect ground-truth side channel (shifts/vignette/leakage/bleaching/neuropil_*), all None = effect absent. Exported from minian.simulation. simulate()/finalize(), build() bodies, and any cell/movie/truth population stay deferred to Steps 5-6. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…on (Step 5a) Implement the four executable steps that turn a blank Scene into a digitized recording: place_somata -> cell_activity -> render -> sensor. This is the first vertical slice of the step engine; optics, fields, and motion follow in 5b-5d. - minian/simulation/steps/: new domain-organized package - base.py: the Step contract (captures the build-time RNG; name/domain) - cell.py: PlaceSomataStep (density->count, Poisson-disk placement, peak-normalized irregular-disk footprints, uniform/lognormal SNR) and CellActivityStep (2-state Markov -> Poisson spikes -> double-exp kernel); pure soma_footprint/calcium_kernel helpers - tissue.py: RenderStep (stage 'cells_only'); additive footprint x trace composite, prefers footprint_observed so optics slots in at 5b - sensor.py: SensorStep; intensity -> photons -> ImageSensor.photons_to_counts - spec.py: wire build() on the four step specs (lazy import, no import cycle); the other seven keep the base NotImplementedError until their milestone - tests: test_steps.py (per-step + 4-step chain); retarget the Step 2 build()-unimplemented test at CellOptics (a 5b placeholder) Scope guards: no optics (planted footprint only; observed/in_focus/detectable stay None), no simulate()/Recording (Step 6), n_neurite_stubs>0 raises. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

….simulation (Step 5b) Implement CellOptics.build() -> CellOpticsStep: per-cell diffraction + defocus + scatter degradation, the planted/observed footprint split, and the geometric in-focus flag. render already prefers footprint_observed, so the chain now composites the degraded footprint automatically. - steps/cell.py: CellOpticsStep + pure helpers - resolve_focal_plane(): Optics.focal_plane_um 'auto' -> median cell depth - degrade_footprint(): observed = attenuation(z) * (planted (x) Gaussian(sigma_total)). Convolution is sum-conserving (defocus spreads light, peak drops, integral constant); only scatter attenuation removes light, so the observed integral is focal-plane-independent. The cell_optics 'brightness' peak factor is NOT applied to the footprint (would double-count defocus) -- it is stored as the per-cell optical_brightness scalar for detectability. - scene.py: Cell gains optical_brightness; detectable left for finalize (Step 6) - spec.py: CellOptics.build() wired; cell_optics docstring clarifies the two uses of sigma_px (footprint blur) vs brightness (peak/detectability scalar) - tests: optics blur/attenuation, defocus integral conservation, focal-plane resolution, in_focus/DOF, render-uses-observed, optics+sensor chain Per review: detectable is a whole-pipeline flag (optical brightness x illumination falloff vs the sensor noise floor), assembled at finalize(), not in the optics step -- captures that excitation/vignette illumination also drives per-cell brightness, not just depth attenuation (resolves spec open-Q 4). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Quality cleanups from a /simplify pass over the step engine; no functional change beyond moving one guard to fail-fast at construction. - soma_footprint: drop the dead `/ footprint.max()` normalization (a 0/1 mask is already peak-normalized) and the doubled max; scale the irregularity noise with max(noise.max(), -noise.min()) instead of allocating a full-frame abs - remove hand-rolled _dist3; inline math.dist() at the call site (the idiom the tests already use) - StepSpec.build return type object -> Step, uniform with the four overrides - resolve_focal_plane: add the missing `optics: Optics` type hint - n_neurite_stubs > 0: reject at PlaceSomata construction via a field_validator (consistent with the other validators) instead of a NotImplementedError buried in PlaceSomataStep.__call__; test asserts the construction-time ValidationError Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Implement the four field effects that layer on the rendered cells, organized by domain to match the existing step layout: * tissue frame (brain-bound): neuropil (additive diffuse background, smooth spatial field x slow mean-1 OU envelope) and bleaching (global mono-exp fluorophore decay). * sensor frame (static): vignette (radial illumination falloff) and leakage (additive baseline glow). Each records its (height, width) field to ground truth; the vignette field is load-bearing for the Step 6 detectable flag. Also gives the deferred vasculature placeholder an honest no-op build() so a spec listing it runs end-to-end. bi_exp bleaching is rejected at construction (a single final_fraction cannot determine a two-component curve), matching the n_neurite_stubs fail-fast precedent. Wires build() on the five specs, exports the new steps and physics helpers, and adds single-step + chain tests (ou_process stationarity, GT shapes, static-field invariance, bi_exp ValidationError, vasculature no-op). Proposal spec doc updated to match. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…sensor (Step 5d-1) Four steps re-read acq.image_sensor.n_px_* to size their pixel grid; switch them to derive it from scene.movie — the array they actually mutate: * place_somata fills the scene canvas (cell count from canvas area), with positions documented as canvas/tissue coordinates (FOV-crop offset deferred to finalize, Step 6). * neuropil takes (n_frames, h, w) straight off the movie. * vignette / leakage take (h, w) off the movie — post-crop this is the sensor FOV, exactly where the static fields belong. Pure no-behavior-change refactor: at margin 0 the scene canvas equals the sensor FOV, so output is byte-identical (the full pre-existing suite passes unchanged). This is the groundwork for the motion margin (5d-2): a step now sizes itself to the scene it is handed, so an oversized tissue canvas becomes a data change with no special-casing. Adds a guard test that runs place_somata and neuropil on a canvas larger than the sensor and confirms they honor it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Add the motion-domain step that completes the forward pipeline and the tissue->sensor boundary. brain_motion rigidly translates the brain-frame movie per frame (explicit trajectory_um or a bounded random walk) and crops the sensor FOV from its center. To keep motion honest, the upstream tissue steps render on a canvas larger than the sensor (Scene.zeros(acq, margin_px=...)): real, simulated tissue sits just off-FOV, so a shift brings genuine content into view rather than a fabricated edge fill. The margin is >= the maximum shift, so the FOV crop never reaches the canvas edge and no fill ever enters the result. Shifts use bilinear (order=1) sub-pixel interpolation; ground-truth shifts (frame, 2) record the applied (dy, dx) displacement in pixels (minian's shift_dim order). The step fails fast on a length-mismatched trajectory or a margin smaller than the shift. * steps/motion.py: BrainMotionStep + bounded_random_walk + shift_and_crop. * scene.py: Scene.zeros/ones gain margin_px to allocate the padded canvas. * spec.py: wire BrainMotion.build() (the last step) + document canvas/crop/units. * tests: walk bounds, shift-and-crop recentering, explicit-trajectory displacement, off-FOV tissue moving into view, GT shift shape/units, the static-field-invariant-under-motion reference-frame test, fail-fast paths, and a full 10-step pipeline cropping back to the sensor FOV. test_spec's old "not implemented" case becomes test_every_step_kind_builds. Deferred (flagged): OU/jump and focus-drift motion (placeholders); automatic margin sizing + footprint cropping to FOV coordinates (finalize, Step 6); the motion-correction RMSE recovery test (Step 10). Proposal spec doc updated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Add the frozen, numpydantic-typed simulator output and the transform that produces it from an exhausted Scene. * recording.py: GroundTruth (A_planted/A_observed/C/S, per-cell centers/snr/ in_focus/detectable, per-effect shifts/vignette/leakage/bleaching/neuropil_* None when absent; n_units/depth_um properties; detectable_subset()) and Recording (spec/observed/ground_truth/snapshots + stage()). save/load are honest NotImplementedError stubs deferred to caching (Step 7). * finalize(scene, spec): a free function (avoids a scene<->recording import cycle). Crops each cell's canvas-sized footprint and canvas-frame position to the sensor FOV at the reference (zero-shift) frame, drops cells left entirely in the motion margin (background, not recoverable units), assembles the per-cell detectable flag, reads the per-effect fields off scene.truth, and downcasts the working movie to Output.store_dtype for observed. Detectability (the locked-in rule): detectable = in_focus AND signal_e / sqrt(baseline_e + read_noise_e^2) >= DETECT_SNR_THRESHOLD, where signal_e = peak_dF * optical_brightness * vignette-at-cell * photons_per_unit * QE. Sensor-derived floor (no magic constant); the threshold is a named module const (3.0) flagged for Step 10 calibration. No sensor step -> falls back to the geometric in_focus; no trace -> not detectable. This is the first consumer of the vignette ground truth: edge cells dimmed below the floor read as undetectable even when in focus. Tests (+8): array shapes/dtypes, planted!=observed under optics, per-effect present/absent, margin-cell drop with FOV-coordinate check, the 3-way detectability rule (bright/dim/out-of-focus), detectable_subset, stage() KeyError. Deferred (flagged): save/load -> Step 7; metrics -> 8; sweep -> 9; threshold calibration -> 10; presets defined inline in tests for now. simulate() is 6b. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Add the package's headline entry point: compose a Spec into a Recording. * simulate.py: simulate(spec, *, until=None) seeds the RNG from spec.seed, sizes the motion margin from the brain_motion spec (so motion recordings need no hand-sized canvas), runs spec.steps in order against a shared Scene, snapshots the movie after each movie-affecting (non-cell-domain) step when save_intermediates is set, honors until=<stage name> (fail-fast on an unknown stage), and finalizes. Exported as minian.simulation.simulate. * recording.py: finalize() now always emits observed at the sensor FOV, cropping a still-canvas-sized movie left by a partial build (until= before brain_motion) for consistency with the cropped footprints. Tests (+7): minimal spec end-to-end (integer-count observed at the sensor FOV), seed determinism, automatic motion-margin sizing, until= early stop + its snapshot set, the full movie-stage snapshot keys with stage("sensor") == observed and cell-domain steps excluded, no-snapshots default, and unknown-until ValueError. Doc-stays-true: spec doc §7 stage table updated to the real step.name keys (neuropil/brain_motion/sensor, not with_*/observed) and the movie-affecting-steps snapshot rule. Step 6 complete: the pipeline now runs spec -> Recording end to end. save/load -> Step 7, metrics -> 8, sweep -> 9. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Implement Recording.save/load (the prior NotImplementedError stubs) and add a thin spec-hash cache wrapper, so a realistic spec simulates once and is reused across the Step 9-10 parameter sweeps. simulate() itself stays pure — caching composes on top. recording.py: * save(path) writes a self-contained zarr directory: observed, a ground_truth/ subgroup (optional fields only when present, tracked in a gt_present attr), an optional snapshots/ subgroup, and a human-readable spec.json. The write is atomic (build {path}.tmp, then os.replace). * load(path) re-validates spec.json and checks its cache_key against the stamped spec_cache_key attr (ValueError on a stale or hand-edited spec). Snapshot coords are the trivial arange grid, rebuilt via _movie_dataarray. cache.py (new): simulate_cached(spec, *, root=None) — hit loads, miss simulates and saves; cache_dir() honors $MINIAN_SIM_CACHE (default ~/.cache/minian/sim) and cache_path() keys by {cache_key}.zarr. Exported from the package. test_caching.py (new, 12 tests): observed + dtype round-trip, every GroundTruth field (bool stays bool), optional-field presence vs None, heterogeneous snapshots (tissue canvas vs cropped FOV), empty (0,h,w) ground truth, atomic overwrite, spec-hash mismatch rejection, and simulate_cached miss-writes / hit-reads (call-counted) + env-var resolution. No new deps (zarr/xarray already core). Full simulation suite: 122 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

A standalone, dependency-light module scoring CNMF output against a GroundTruth, shared by the unit tests, the Step 9-10 parameter-matrix tests, and both training notebooks. Threshold-agnostic: callers supply the bounds and the fair recall denominator (GroundTruth.detectable_subset). Inputs go through np.asarray so CNMF's xr.DataArray output and plain ndarrays both work; unit-first dims (unit_id, height, width) / (unit_id, frame) already match GroundTruth. The five spec-§9 functions: * hungarian_match(A_est, A_true, *, metric="iou", energy_frac=0.9) -> Match Energy-quantile (CaImAn-style) binarization — each footprint's mask is the smallest set of brightest pixels holding energy_frac of its energy — then a pairwise IoU matmul and an optimal assignment via scipy linear_sum_assignment (maximize), with zero-overlap pairs dropped. Match exposes recall/precision (iou_threshold), mean_iou, pairing, matched_pairs, n_est/n_true. * trace_pearson(C_est, C_true, pairing) -> per-pair Pearson (nan if constant). * spike_precision_recall(..., tol_frames=2, spike_thresh=0.0) -> SpikeScore, pooled greedy nearest-within-tolerance spike matching. * shift_rmse — pure RMSE over (frame, 2); docstring flags the sign convention (a correction estimate is the negation of GroundTruth.shifts). * field_pearson — flattened, scale/offset-invariant field-shape correlation. test_metrics.py (new, 16 tests): identical / disjoint / shuffled-order / unequal- count matching, thresholdable partial-overlap IoU, metric + energy_frac guards, xarray parity, trace Pearson (identical / anti / constant), spike exact / within- tol / beyond-tol / extra-estimate, shift RMSE, and field Pearson invariance. No new deps (scipy already core). Full simulation suite: 138 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

A thin, standalone parameter-sweep generator over dotted-path overrides — the single primitive shared by the Step 10 @parametrize correctness grids and the benchmark harness that collects metrics into a tidy DataFrame. simulate() never depends on it. sweep(base, axes) yields one validated SweptSpec per point in the Cartesian product of the axes. Three path forms: nested model (acquisition.optics.na), step-by-kind (steps.place_somata.density_per_mm2, indexing steps by the unique kind §11 guarantees), and top-level (seed). Each combination is re-validated via model_validate — rebuilding the discriminated steps union and re-running every §11 cross-field validator — so an out-of-range axis (na=-1, soma > FOV) fails fast at yield. An empty axes dict yields the base once with axes={}. _set_path applies overrides immutably via chained model_copy and validates each segment against model_fields *before* copying: pydantic's model_copy(update=...) skips validation and silently accepts unknown keys, so an unchecked typo would no-op rather than raise. Clear ValueErrors for unknown field / unknown step kind / scalar-descent / malformed steps. path. SweptSpec(Spec) carries axes: dict = Field(exclude=True) — a genuine Spec subclass (drops into simulate()/Recording unchanged), with axes kept out of model_dump_json so cache_key() stays identical to the equivalent plain spec. Sweeping never perturbs cache dedup, and the tag vanishes when a recording is persisted. test_sweep.py (new, 14 tests): product count + axis bookkeeping, all three path forms, sibling-step isolation, tuple-valued axis, empty axes, base-not-mutated, cache_key parity (axes excluded), invalid-value validation failure, the four error paths, and an end-to-end simulate() of a yielded spec. No new deps (itertools stdlib). Full simulation suite: 152 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…tep 10a) The first milestone of the additive demonstration suite (no test replacement this PR — that is a later campaign). Where test_steps.py exercises each step's build() in isolation, these run the composed simulate() pipeline end to end and assert emergent physics a broken chain would violate — the foundation the per-stage recovery demos (10b motion, 10c CNMF) stand on: * test_detectability_falls_with_depth — sweep the cell band down past a fixed focal plane; the geometric in-focus fraction falls 1.0 -> ~0 (monotone) and detectability tracks it down. Anchored on the in-focus fraction (rock-solid geometry); the detectable fraction is shot-noise-limited shallow, which is itself honest physics. * test_strong_vignette_concentrates_detection_centrally — a steep illumination falloff dims rim cells below the floor, so detection concentrates centrally. * test_bleaching_dims_later_frames — late-frame mean < early-frame mean. * test_static_fields_are_invariant_to_motion — same-seed recordings differing only in motion magnitude share identical sensor-frame vignette/leakage GT while their shift trajectories differ (the tissue/sensor frame separation). Small, fixed-seed specs with decisive (not finely-calibrated) margins — a capability demonstration, not the calibrated replacement suite. No new deps, no markers, touches nothing existing. Full simulation suite: 156 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The first test that feeds a simulated recording to a real minian pipeline stage and checks it recovers exact known ground truth — the per-stage diagnosability the end-to-end golden-number test cannot offer. Here the stage is motion_correction.estimate_motion, validated against the injected GroundTruth.shifts. test_estimate_motion_recovers_true_shifts: a 30 s, 128 px, ~107-cell recording with a 4 µm motion walk -> estimate_motion -> align to GT (negate + re-reference to frame 0, matching minian's correction-sign convention) -> shift_rmse < 0.5 px (observed ~0.26), plus a per-axis trajectory-correlation sanity check. Sub-pixel recovery, deterministic, ~46 s. slow-marked and off the per-PR path: motion estimation needs a realistic, minutes-scale recording with enough texture/frames to converge — a 1 s clip would not exercise it. pyproject registers the `slow` marker and adds -m "not slow" to addopts so the default run stays fast (run slow demos with `pytest -m slow`); only the new tests are affected. Deliberately generous bound (~2x observed), a capability demonstration rather than the calibrated replacement suite (a later PR). Default suite: 156 passed / 1 deselected. Slow test: 1 passed in ~46 s. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The headline per-stage test: feed a simulated recording to minian's real CNMF chain and check it recovers the known detectable footprints — the direct, diagnosable comparison against exact ground truth the end-to-end golden-number test cannot offer. Completes Step 10's additive demonstration suite (10a structural self-consistency, 10b motion recovery, 10c CNMF recovery). test_cnmf_recovers_detectable_footprints: a 60 s, 128 px, ~50-cell still recording -> the full preprocessing -> initialization -> spatial/temporal-update chain (a faithful, minimal transcription of the pipeline notebook) -> hungarian_match(A_est, detectable.A_observed) -> assert recall >= 0.7, mean_iou >= 0.5, precision >= 0.4. Observed ~0.86 / 0.72 / 0.75; bounds are generous and robust to the BLAS/numerical-library drift the golden CNMF sums are sensitive to — a capability demonstration, not the calibrated replacement suite (a later PR). Recovery is matched against detectable_subset() (in-focus, above-noise-floor cells), so sweeping physics out of range is not scored as a minian failure. _run_cnmf notes: preprocessing runs on uint8 (OpenCV median denoise), the seeds/CNMF stage on float (numba); the save_minian calls between update steps are load-bearing (the zarr round-trips concretize dask chunk shapes — skipping them surfaces "shape is None"). Runs under a module-scoped LocalCluster + tmp_path intermediate dir. slow-marked and guarded by importorskip("pymetis"): the CNMF chain needs pymetis (no Windows wheel today), so this skips on a stock Windows box and activates automatically wherever pymetis is installed (CI Linux, conda) and once the in-flight pymetis-replacement lands — no Windows special-casing to remove. The slow marker was registered in Step 10b, so no pyproject change here. Verified: passes in ~90 s in a pymetis env; skips cleanly on Windows; default suite 156 passed / 1 skipped / 1 deselected. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

A miniscope objective collects light over a cone whose solid angle scales as NA^2, so a low-NA scope is fundamentally dimmer, not just blurrier -- an effect the optics model omitted (brightness depended only on the defocus peak drop and scatter attenuation, never on NA). - Add Optics.collection_efficiency (proportional to NA^2) and fold it into Acquisition.cell_optics brightness and the observed-footprint gain in CellOpticsStep (degrade_footprint's factor renamed attenuation -> gain: now scatter attenuation x collection efficiency). It is a flat, focal-plane-independent light-loss, so the intensity-conservation invariant still holds, and it propagates to detectability (lower NA -> fewer detectable cells). - The absolute proportionality constant is absorbed into the sensor step's photons_per_unit exposure scale; the structural and recovery tests that pin a detectability regime scale photons_per_unit by 1/NA^2 so the recordings (and thus their thresholds) are unchanged. - Add test_cell_optics_brightness_scales_with_na_squared; update the surface-brightness and defocus-conservation expectations accordingly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

First training notebook (minian/notebooks/training/01_anatomy), following #306's self-contained-bundle layout: build a 1-photon miniscope recording forward from its physics -- the inverse of the analysis pipeline -- as an interactive construction (understand -> explore with sliders -> commit, one stage at a time). WIP: the scope (optics + sensor), placing somata, and calcium activity stages are in; optics degradation, render (first movie), background fields, motion, and the sensor follow. Generates its own data via minian.simulation, so it runs offline. Interactive viz uses ipywidgets + matplotlib (mediapy for the movie stages). No notebook CI on this branch yet -- it will run under #306's notebook-discovery test once that lands and this branch rebases onto it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

codecov · 2026-06-04T20:17:04Z

Codecov Report

❌ Patch coverage is 97.16060% with 32 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
minian/simulation/steps/cell.py	96.07%	8 Missing ⚠️
minian/simulation/spec.py	98.01%	6 Missing ⚠️
minian/simulation/recording.py	97.12%	5 Missing ⚠️
minian/simulation/steps/tissue.py	93.58%	5 Missing ⚠️
minian/simulation/metrics.py	97.08%	3 Missing ⚠️
minian/simulation/steps/motion.py	96.42%	2 Missing ⚠️
minian/simulation/simulate.py	96.87%	1 Missing ⚠️
minian/simulation/steps/base.py	90.90%	1 Missing ⚠️
minian/simulation/steps/sensor.py	97.72%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

daharoni · 2026-06-04T20:36:16Z

Please ignore size and test coverage for now....

sneakers-the-rat · 2026-06-04T22:46:43Z

Please ignore size

lmao okay then.

I actually think something like this would be great as an independent package (that minian then imports) - we need good simulated data generation for basically all the analysis projects, and they all do it differently and with varying degrees of quality. having one shared lib that can do that would be super useful for testing and teaching with minian, calab, cala, and even mio.

sneakers-the-rat · 2026-06-04T22:56:50Z

+    "from matplotlib.colors import LogNorm\n",
+    "from scipy.ndimage import gaussian_filter\n",
+    "from ipywidgets import interact, FloatSlider, IntSlider\n",
+    "import mediapy\n",


missing dep, not sure that we need another dep for this? don't we already show videos somehow?

showing videos in notebooks in minian works really really poorly currently (slow and jumpy). I was seeing if mediapy happens to work better and it doesn't seem too.

daharoni · 2026-06-04T23:14:04Z

I actually think something like this would be great as an independent package (that minian then imports)
Yea, I completely agree. We will see where this goes.

Rename soma_footprint -> neuron_footprint and add a `morphology` selector for the two GCaMP targeting variants: - "soma" (default): the existing lumpy disk, soma-targeted GCaMP. Bit-for-bit identical to the old footprint (dendrites are stamped only after the soma's RNG draw, so the soma stream is untouched). - "cytosolic": the soma plus a few tapering proximal dendrites (standard GCaMP6/7/8), graded dimmer and thinner toward the tip so scatter, defocus, and the noise floor erase them first. Replaces the dead PlaceSomata.n_neurite_stubs reservation with morphology + n_dendrites / dendrite_length_um / dendrite_width_um. New helpers _stamp_disk / _stamp_dendrites render dendrites in local bounding boxes (cheap at high cell counts). Tests cover cytosolic reach/grading and soma determinism. Anatomy notebook (Stage 1a sandbox): GCaMP type + dendrite params are now editable cell settings (not sliders); footprints are generated once on a sensor-independent 0.5 um grid and rescaled, so optics sliders never regenerate shapes and magnification/pitch only zoom the same cells. Document the reasoning in-source: a "pixel-limited, not diffraction-limited" note on Optics.diffraction_sigma_um and a "shape is physical, grid is sampling" note on neuron_footprint, cross-linked. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Field curvature (Petzval): a miniscope has no room for a field flattener, so the in-focus surface is a shallow bowl, not a plane. Add Optics.field_curvature_radius_um (µm; None = ideal flat field) and a Layer-2 helper Optics.focal_curvature_shift_um(r) = R - sqrt(R^2 - r^2) ~ r^2/(2R). CellOpticsStep now gives each cell its own focal depth: the central plane minus the sagitta at its distance from the optical axis (canvas center), so off-axis cells focus shallower and blur toward the edges. The curvature over one soma is negligible vs the ~mm radius, so cells are evaluated at their center (no footprint warping). Default None keeps every existing preset/test unchanged. Perf: degrade_footprint now computes the optical blur only within the cell's bounding box, grown by the PSF truncation radius (4*sigma, beyond which the Gaussian is exactly zero). Bit-identical to the full-canvas filter, but the optics step is ~4-11x faster (the win grows with canvas size, since the old cost scaled with area and the new one scales with cell size). Anatomy notebook (Stage 1a sandbox): add a field-curvature slider (default 2.5 mm, realistic) and draw the curved focal surface + DOF band in the side view; soften the proximal-dendrite waviness (wavy, not curly). Tests: focal_curvature_shift_um (flat/on-axis/grows-with-r, rejects <= 0) and an off-axis-cells-blur step test. Full non-slow simulation suite: 160 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

sneakers-the-rat · 2026-06-05T02:10:02Z

+_DOMAIN_RANK: dict[str, int] = {"cell": 0, "tissue": 1, "motion": 2, "sensor": 3}
+
+
+class StepSpec(_Base):


why do step specs need to be different than the steps themselves? or, why are they in a different place? it seems like this might be for serialization/caching's sake, but it doesn't look like any of the steps bear any data on their own, so they would still have the same fields -> be just as serializable/hashable. just trying to get my head around 'what does a step consist of' and it seems like it's a step class, a stepspec, some additional fields in the ground truth, some additional fields in the scene, and maybe more? i don't see a declarative relationship between a stepspec and its step, just dynamically via the build method, so that would be nice to have.

I think best case for a design here would be to have the steps be separable and composable, and to do that would need to have each step be somewhat self contained and not bleed into either one another or the containing classes like scene, but as-is there is a bit of crosstalk/undeclared dependencies like e.g. the tissue.RenderStep depends on cell.CellOpticsStep which depends on cell.PlaceSomataStep, but that wouldn't be obvious unless you read the code.

the type annotations between a Step and its Spec need to use generics btw, because currently they are all receiving the base Spec class, and so all the attribute accesses are untyped, but if each step specifies its spec class with a typevar (or just has its spec as its attrs) then it would be possible to correctly resolve those types.

one way of handling dependencies would just be to declare them: like if CellOpticsStep depends on PlaceSomataStep, then a validator on the steps list could check that there is one of the necessary step instances prior to it in the list.

sneakers-the-rat · 2026-06-05T02:20:30Z

+    field_curvature_radius_um: float | None = Field(
+        default=None,
+        description="Petzval field-curvature radius, µm (typical miniscope ≈ 2000–3000). "
+        "Off-axis cells focus *shallower* by the spherical sagitta; None = ideal flat "
+        "field. A miniscope has no room for a field flattener, so this is usually finite.",
+    )
+
+    @field_validator("field_curvature_radius_um")
+    @classmethod
+    def _check_curvature(cls, v: float | None) -> float | None:
+        if v is not None and v <= 0:
+            raise ValueError(
+                f"field_curvature_radius_um ({v}) must be > 0, or None for a flat field."
+            )
+        return v


could just be

from typing import Annotated as A from annotated_types import Gt class Optics(_Base): # ... field_curvature_radius_um: A[float, Gt(0)] | None

Acquisition / Optics now describe a real UCLA Miniscope V4 and name the focus geometry clearly: - Rename Optics.focal_plane_um -> Acquisition.focal_depth_in_tissue_um (the depth of the focal plane below the tissue surface, same coordinate as each cell's z; "auto" -> median cell depth). It moves off Optics because it is experiment geometry, not a lens property. The forward model is unchanged. - Add Acquisition.front_working_distance_um (V4 ~700): an informational scope spec for surgery/implant planning. Nothing in the simulation reads it. - Optics.depth_of_field_um now defaults to "auto", derived from NA via Optics.resolved_depth_of_field_um (~n*lambda/NA^2) instead of a hardcoded value; a number still overrides. DOF is set by the optics, so it shouldn't be a free knob. Anatomy notebook commit cell is now the actual V4: Python480 CMOS at 608x608, 4.8 um pitch, effective NA 0.30, GCaMP, field curvature 2.1 mm, front working distance 700 um, focal depth 200 um into tissue, DOF auto (+/-7.8 um). Updated every call site (steps, recording/spec/structural/recovery tests, the notebook, and the local preview scripts) and the design doc. Full non-slow simulation suite: 161 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ample_neurons The footprint generator now models whole neurons (soma + proximal dendrites for cytosolic GCaMP), so the soma-specific names no longer fit. Rename the placement step and its spec/discriminator throughout: PlaceSomata -> PlaceNeurons PlaceSomataStep -> PlaceNeuronsStep place_somata -> place_neurons (step name / kind / until= key) sample_somata -> sample_neurons Kept soma_radius_um and the "soma"/"cytosolic" morphology values, which genuinely refer to the cell body. Pre-release, so no back-compat shim. Also factor the *distribution* half of the step into a public sample_neurons() (centers + per-cell SNR, no footprint stamping). PlaceNeuronsStep delegates to it then stamps; a test pins it draw-for-draw identical to the fused step. This lets the anatomy notebook visualize placement at full FOV without paying for ~hundreds of footprint paints. Rebuild the notebook's Stage 2 widget on the new seam: full committed FOV (no preview shrink), top + side views with true-radius disks colored by depth, an SNR histogram, and a min_distance spacing knob. No optics and no footprint imagery in Stage 2 -- crisp footprints carry no depth cue, so degradation is held back for the optics stage. Prose corrected: planted cell bodies stay well separated at realistic density; real crowding emerges from optical blur next stage. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…bserved reveal Expand the anatomy notebook's place_neurons section into a full explore -> commit -> "see it through the scope" arc: - Presets (CA1 thin layer ~150um, Volumetric 50-200um slab) via a ToggleButtons selector that drives the sliders; CA1 is the clean-run default. - GCaMP morphology selector (soma vs cytosolic). It does not move the distribution dots (those mark cell bodies) but feeds the commit and shows its effect in the optics reveal, where cytosolic's thin dendrites are erased first. - Slider-driven, idempotent commit: whatever you dialed in (preset + tweaks + morphology) is what gets imaged; re-running replaces the prior placement. - New ending cell pushes the committed placement through the optics (place_neurons + optics, no calcium -- degradation is purely spatial) and shows A_planted -> A_observed: full-FOV pair on a shared gamma-stretched scale (the field is only a few percent as bright) plus a per-cell zoom isolating the blur. Teaches depth-driven dimming/defocus AND field curvature (only a central island stays in focus, since the V4 has no field-flattener and DOF is just +/-7.8um). Focus is now "auto" in the V4 commit (tracks the placed layer's median depth, like turning the focus knob), so the presets stay sensibly focused; the build() helper gains an `extra` arg to preview optics without committing the step. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

"auto" focal depth previously resolved to the median cell depth, which ignores field curvature: off-axis cells focus shallower, so a median-of-z plane left most of the field defocused. Defocus is sigma = NA*|z - focal_eff| with focal_eff = focal - shift(r), i.e. NA*|(z + shift(r)) - focal| -- linear in each cell's effective depth e = z + shift(r). The focal that minimizes total defocus is therefore exactly the MEDIAN of the effective depths, which folds curvature in by construction and pulls the plane deeper to claw back off-axis cells. resolve_focal_plane() gains optional (optics, axis_yx); when a curvature radius and optical axis are supplied it returns median(z + shift(r)), else the prior median z (backward compatible). CellOpticsStep computes the axis first and passes both. New test covers the curvature path; full non-slow sim suite green (164 passed). Effect on the anatomy notebook's CA1 preset: focus moves 150 -> 186 um and in-focus cells go 73 -> 176 / 852. Also tune the Stage 2 presets: CA1 depth 140-160 um (was 145-155), Volumetric density 1200/mm2 (was 500), and label the optics reveal with the resolved focus (shown deeper than the median depth, illustrating the curvature compensation). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… soma diameter PlaceNeurons.density_per_mm2 -> density_per_mm3. Cell count is now round(density_per_mm3 * FOV_area * thickness), the depth-range thickness floored at one soma diameter (2*soma_radius_um) so a thin or strictly planar (lo==hi) layer still yields cells rather than zero. This is the physically fundamental quantity: a thicker slab of tissue holds proportionally more neurons, and the count now scales with the imaged depth. sample_neurons() computes the floored thickness and volume; spec field + docstrings updated. All call sites migrated: tests converted with count-preserving densities (rho_vol = rho_areal * 1000 / thickness_um, so existing cell counts and the exact count assertions are unchanged), the sweep override path string, and test_sweep. Full non-slow sim suite green (164 passed). Anatomy notebook Stage 2: density slider + presets are now volumetric (CA1 45000/mm3 over a 20 um layer -> 852 cells; Volumetric 8000/mm3 over a 150 um slab -> 1136 cells, the same counts as before), and the prose explains the volumetric count rule and the soma-diameter floor. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

neuron_footprint computed its hypot/threshold/irregularity-noise over the whole canvas per cell, while a cell occupies only a small patch (the soma reaches at most radius*(1+irregularity) from the center). degrade_footprint and the dendrite disk stamps were already windowed; the soma paint was the one that was missed. Compute the soma inside its bounding box and write that patch into the full-canvas array; dendrite stamping is unchanged (it self-windows). For the 852-cell CA1-cytosolic preset on a 608x608 canvas this takes the place_neurons + optics build from ~31s to ~9.6s (3.3x), with the soma's own full-canvas cost dropping from ~23s to ~0.1s. The windowed irregularity noise draws fewer randoms, which shifts the downstream CellActivity RNG stream (acceptable: unreleased, no exact-pixel goldens). One finely-tuned structural test dipped below the noise floor as a result; in_focus stayed 55/55 and the central-clustering property holds, so re-tuned its photons_per_unit 110 -> 1000 to exercise a non-trivial detectable population again. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…s reveal Render the Stage 2 "A_planted -> A_observed" panels with the GCaMP green LUT (matching the imaging plots earlier in the notebook) instead of magma, and scale each panel to its own peak rather than a shared gamma stretch. A_observed is now scaled to its max footprint brightness, so the faint observed field is lifted to full visibility and the top row reads as shape (which cells stay sharp) rather than brightness. The real dimming stays reported as the peak-% in the title and bites later at the sensor noise floor. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

daharoni and others added 19 commits June 4, 2026 12:11

sneakers-the-rat reviewed Jun 4, 2026

View reviewed changes

daharoni and others added 2 commits June 4, 2026 17:12

sneakers-the-rat reviewed Jun 5, 2026

View reviewed changes

daharoni and others added 2 commits June 4, 2026 20:34

daharoni and others added 5 commits June 4, 2026 23:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(simulation): physically-driven synthetic-data + teaching suite (minian.simulation)#328

feat(simulation): physically-driven synthetic-data + teaching suite (minian.simulation)#328
daharoni wants to merge 28 commits into
v2-integrationfrom
feat/simulation-suite

daharoni commented Jun 4, 2026

Uh oh!

codecov Bot commented Jun 4, 2026 •

edited

Loading

Uh oh!

daharoni commented Jun 4, 2026

Uh oh!

sneakers-the-rat commented Jun 4, 2026

Uh oh!

sneakers-the-rat Jun 4, 2026

Uh oh!

daharoni Jun 5, 2026

Uh oh!

daharoni commented Jun 4, 2026

Uh oh!

sneakers-the-rat Jun 5, 2026 •

edited

Loading

Uh oh!

sneakers-the-rat Jun 5, 2026

Uh oh!

sneakers-the-rat Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		_DOMAIN_RANK: dict[str, int] = {"cell": 0, "tissue": 1, "motion": 2, "sensor": 3}


		class StepSpec(_Base):

Conversation

daharoni commented Jun 4, 2026

What this adds

Status

Base and dependencies

Uh oh!

codecov Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

daharoni commented Jun 4, 2026

Uh oh!

sneakers-the-rat commented Jun 4, 2026

Uh oh!

sneakers-the-rat Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

daharoni Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

daharoni commented Jun 4, 2026

Uh oh!

sneakers-the-rat Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sneakers-the-rat Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

sneakers-the-rat Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented Jun 4, 2026 •

edited

Loading

sneakers-the-rat Jun 5, 2026 •

edited

Loading