Skip to content

feat(simulation): physically-driven synthetic-data + teaching suite (minian.simulation)#328

Draft
daharoni wants to merge 28 commits into
v2-integrationfrom
feat/simulation-suite
Draft

feat(simulation): physically-driven synthetic-data + teaching suite (minian.simulation)#328
daharoni wants to merge 28 commits into
v2-integrationfrom
feat/simulation-suite

Conversation

@daharoni
Copy link
Copy Markdown
Contributor

@daharoni daharoni commented Jun 4, 2026

Draft / WIP. Adds minian.simulation, a physically-driven synthetic calcium-imaging data generator plus a teaching scaffold. It produces ground-truth recordings used to test the pipeline (motion correction and CNMF recovery) and serves as a hands-on teaching entry point.

What this adds

  • Typed Spec surface and a Layer-2 physics/scene runtime (spec.py, scene.py, steps/).
  • A composable step chain: tissue, cell, optics (incl. NA^2 light-collection efficiency), sensor, brain motion.
  • Recording/GroundTruth types and a simulate() orchestrator with spec-hash to zarr local caching.
  • A shared structural-metrics oracle (metrics.py) and a Cartesian spec sweep generator (sweep.py).
  • Test suite: unit tests per module plus whole-pipeline structural self-consistency and per-stage motion-correction / CNMF footprint-recovery demos.

Status

  • Steps 2 to 13 implemented and committed (linear, one commit per step).
  • Step 14 (the two training notebooks under minian/notebooks/training/) is in progress and intentionally left WIP.

Base and dependencies

🤖 Generated with Claude Code

daharoni and others added 19 commits June 4, 2026 12:11
Introduce the minian.simulation package with its pydantic v2 spec contract: the physical interface (Acquisition holding Optics / ImageSensor / Tissue with unit conversions), the StepSpec base, 11 step specs, the static AnyStep discriminated union, and the top-level Spec with cache_key + cross-field validators (hard fails + advisory SpecWarnings).

Add pydantic>=2 and numpydantic>=1.8 as core runtime deps (not an extra) so the training notebooks run offline after a plain pip install; regenerate pdm.lock.

Specs are grouped by the physical thing they describe (lens props on Optics, detector geometry + noise on ImageSensor, scattering on Tissue); the sensor step keeps only the exposure scale. Physics math, Scene, executable step bodies, and numpydantic output types are deferred to later migration steps. 20 unit tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…Step 3)

Derive the phenomenological quantities the simulator steps consume from the
Layer-1 physical knobs, as small closed-form helpers on the model that owns
each physical thing:

- Optics.diffraction_sigma_um (sigma ~= 0.21*lambda/NA)
- Optics.defocus_sigma_um (NA*|z-focal|, intensity-conserving)
- Tissue.attenuation (Beer-Lambert exp(-z/mfp)) and scatter_sigma_um
- Acquisition.cell_optics -> (sigma_px, brightness), quadrature-combining the
  above; the sigma_0^2/sigma_tot^2 peak drop makes defocus volume-conserving
  while only attenuation removes light
- ImageSensor.photons_to_counts: Poisson shot + Gaussian read -> x gain ->
  floor -> clip, the only place fluorescence becomes integer counts

Each helper has a teaching docstring (approximation, units, typical range) and a
direct unit test, including the defocus integrated-intensity-conservation
invariant. Scene, build() bodies, and the in_focus/detectable flags stay
deferred to Steps 4-5.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…Step 4)

Add the mutable counterpart to the immutable Spec: the working state the
executable steps (Step 5) read and write, and the substrate single-step unit
tests construct directly. Three plain dataclasses (not pydantic -- they hold
xarray/Generator and are mutated in place):

- Scene: acq, rng, movie, cells, truth, snapshots. Scene.zeros/Scene.ones build
  a (frame, height, width) xr.DataArray with the minian dim names + arange
  coords, in float64 working precision; the downcast to Output.store_dtype is a
  finalize() concern (Step 6). rng=None -> default_rng().
- Cell: per-cell record whose fields mirror the GroundTruth structural columns
  (spec section 8) one-for-one, all None until the producing step fills them.
- GroundTruthBuilder: the per-effect ground-truth side channel
  (shifts/vignette/leakage/bleaching/neuropil_*), all None = effect absent.

Exported from minian.simulation. simulate()/finalize(), build() bodies, and any
cell/movie/truth population stay deferred to Steps 5-6.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…on (Step 5a)

Implement the four executable steps that turn a blank Scene into a digitized
recording: place_somata -> cell_activity -> render -> sensor. This is the first
vertical slice of the step engine; optics, fields, and motion follow in 5b-5d.

- minian/simulation/steps/: new domain-organized package
  - base.py:   the Step contract (captures the build-time RNG; name/domain)
  - cell.py:   PlaceSomataStep (density->count, Poisson-disk placement,
               peak-normalized irregular-disk footprints, uniform/lognormal SNR)
               and CellActivityStep (2-state Markov -> Poisson spikes ->
               double-exp kernel); pure soma_footprint/calcium_kernel helpers
  - tissue.py: RenderStep (stage 'cells_only'); additive footprint x trace
               composite, prefers footprint_observed so optics slots in at 5b
  - sensor.py: SensorStep; intensity -> photons -> ImageSensor.photons_to_counts
- spec.py: wire build() on the four step specs (lazy import, no import cycle);
  the other seven keep the base NotImplementedError until their milestone
- tests: test_steps.py (per-step + 4-step chain); retarget the Step 2
  build()-unimplemented test at CellOptics (a 5b placeholder)

Scope guards: no optics (planted footprint only; observed/in_focus/detectable
stay None), no simulate()/Recording (Step 6), n_neurite_stubs>0 raises.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
….simulation (Step 5b)

Implement CellOptics.build() -> CellOpticsStep: per-cell diffraction + defocus +
scatter degradation, the planted/observed footprint split, and the geometric
in-focus flag. render already prefers footprint_observed, so the chain now
composites the degraded footprint automatically.

- steps/cell.py: CellOpticsStep + pure helpers
  - resolve_focal_plane(): Optics.focal_plane_um 'auto' -> median cell depth
  - degrade_footprint(): observed = attenuation(z) * (planted (x) Gaussian(sigma_total)).
    Convolution is sum-conserving (defocus spreads light, peak drops, integral
    constant); only scatter attenuation removes light, so the observed integral
    is focal-plane-independent. The cell_optics 'brightness' peak factor is NOT
    applied to the footprint (would double-count defocus) -- it is stored as the
    per-cell optical_brightness scalar for detectability.
- scene.py: Cell gains optical_brightness; detectable left for finalize (Step 6)
- spec.py: CellOptics.build() wired; cell_optics docstring clarifies the two
  uses of sigma_px (footprint blur) vs brightness (peak/detectability scalar)
- tests: optics blur/attenuation, defocus integral conservation, focal-plane
  resolution, in_focus/DOF, render-uses-observed, optics+sensor chain

Per review: detectable is a whole-pipeline flag (optical brightness x
illumination falloff vs the sensor noise floor), assembled at finalize(), not in
the optics step -- captures that excitation/vignette illumination also drives
per-cell brightness, not just depth attenuation (resolves spec open-Q 4).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Quality cleanups from a /simplify pass over the step engine; no functional
change beyond moving one guard to fail-fast at construction.

- soma_footprint: drop the dead `/ footprint.max()` normalization (a 0/1 mask
  is already peak-normalized) and the doubled max; scale the irregularity noise
  with max(noise.max(), -noise.min()) instead of allocating a full-frame abs
- remove hand-rolled _dist3; inline math.dist() at the call site (the idiom the
  tests already use)
- StepSpec.build return type object -> Step, uniform with the four overrides
- resolve_focal_plane: add the missing `optics: Optics` type hint
- n_neurite_stubs > 0: reject at PlaceSomata construction via a field_validator
  (consistent with the other validators) instead of a NotImplementedError buried
  in PlaceSomataStep.__call__; test asserts the construction-time ValidationError

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Implement the four field effects that layer on the rendered cells, organized
by domain to match the existing step layout:

* tissue frame (brain-bound): neuropil (additive diffuse background, smooth
  spatial field x slow mean-1 OU envelope) and bleaching (global mono-exp
  fluorophore decay).
* sensor frame (static): vignette (radial illumination falloff) and leakage
  (additive baseline glow). Each records its (height, width) field to ground
  truth; the vignette field is load-bearing for the Step 6 detectable flag.

Also gives the deferred vasculature placeholder an honest no-op build() so a
spec listing it runs end-to-end. bi_exp bleaching is rejected at construction
(a single final_fraction cannot determine a two-component curve), matching the
n_neurite_stubs fail-fast precedent. Wires build() on the five specs, exports
the new steps and physics helpers, and adds single-step + chain tests
(ou_process stationarity, GT shapes, static-field invariance, bi_exp
ValidationError, vasculature no-op). Proposal spec doc updated to match.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…sensor (Step 5d-1)

Four steps re-read acq.image_sensor.n_px_* to size their pixel grid;
switch them to derive it from scene.movie — the array they actually mutate:

* place_somata fills the scene canvas (cell count from canvas area), with
  positions documented as canvas/tissue coordinates (FOV-crop offset deferred
  to finalize, Step 6).
* neuropil takes (n_frames, h, w) straight off the movie.
* vignette / leakage take (h, w) off the movie — post-crop this is the sensor
  FOV, exactly where the static fields belong.

Pure no-behavior-change refactor: at margin 0 the scene canvas equals the
sensor FOV, so output is byte-identical (the full pre-existing suite passes
unchanged). This is the groundwork for the motion margin (5d-2): a step now
sizes itself to the scene it is handed, so an oversized tissue canvas becomes a
data change with no special-casing. Adds a guard test that runs place_somata
and neuropil on a canvas larger than the sensor and confirms they honor it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add the motion-domain step that completes the forward pipeline and the
tissue->sensor boundary.

brain_motion rigidly translates the brain-frame movie per frame (explicit
trajectory_um or a bounded random walk) and crops the sensor FOV from its
center. To keep motion honest, the upstream tissue steps render on a canvas
larger than the sensor (Scene.zeros(acq, margin_px=...)): real, simulated tissue
sits just off-FOV, so a shift brings genuine content into view rather than a
fabricated edge fill. The margin is >= the maximum shift, so the FOV crop never
reaches the canvas edge and no fill ever enters the result. Shifts use bilinear
(order=1) sub-pixel interpolation; ground-truth shifts (frame, 2) record the
applied (dy, dx) displacement in pixels (minian's shift_dim order). The step
fails fast on a length-mismatched trajectory or a margin smaller than the shift.

* steps/motion.py: BrainMotionStep + bounded_random_walk + shift_and_crop.
* scene.py: Scene.zeros/ones gain margin_px to allocate the padded canvas.
* spec.py: wire BrainMotion.build() (the last step) + document canvas/crop/units.
* tests: walk bounds, shift-and-crop recentering, explicit-trajectory
  displacement, off-FOV tissue moving into view, GT shift shape/units, the
  static-field-invariant-under-motion reference-frame test, fail-fast paths, and
  a full 10-step pipeline cropping back to the sensor FOV. test_spec's old
  "not implemented" case becomes test_every_step_kind_builds.

Deferred (flagged): OU/jump and focus-drift motion (placeholders); automatic
margin sizing + footprint cropping to FOV coordinates (finalize, Step 6); the
motion-correction RMSE recovery test (Step 10). Proposal spec doc updated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add the frozen, numpydantic-typed simulator output and the transform that
produces it from an exhausted Scene.

* recording.py: GroundTruth (A_planted/A_observed/C/S, per-cell centers/snr/
  in_focus/detectable, per-effect shifts/vignette/leakage/bleaching/neuropil_*
  None when absent; n_units/depth_um properties; detectable_subset()) and
  Recording (spec/observed/ground_truth/snapshots + stage()). save/load are
  honest NotImplementedError stubs deferred to caching (Step 7).
* finalize(scene, spec): a free function (avoids a scene<->recording import
  cycle). Crops each cell's canvas-sized footprint and canvas-frame position to
  the sensor FOV at the reference (zero-shift) frame, drops cells left entirely
  in the motion margin (background, not recoverable units), assembles the
  per-cell detectable flag, reads the per-effect fields off scene.truth, and
  downcasts the working movie to Output.store_dtype for observed.

Detectability (the locked-in rule): detectable = in_focus AND
signal_e / sqrt(baseline_e + read_noise_e^2) >= DETECT_SNR_THRESHOLD, where
signal_e = peak_dF * optical_brightness * vignette-at-cell * photons_per_unit *
QE. Sensor-derived floor (no magic constant); the threshold is a named module
const (3.0) flagged for Step 10 calibration. No sensor step -> falls back to the
geometric in_focus; no trace -> not detectable. This is the first consumer of
the vignette ground truth: edge cells dimmed below the floor read as
undetectable even when in focus.

Tests (+8): array shapes/dtypes, planted!=observed under optics, per-effect
present/absent, margin-cell drop with FOV-coordinate check, the 3-way
detectability rule (bright/dim/out-of-focus), detectable_subset, stage() KeyError.

Deferred (flagged): save/load -> Step 7; metrics -> 8; sweep -> 9; threshold
calibration -> 10; presets defined inline in tests for now. simulate() is 6b.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add the package's headline entry point: compose a Spec into a Recording.

* simulate.py: simulate(spec, *, until=None) seeds the RNG from spec.seed, sizes
  the motion margin from the brain_motion spec (so motion recordings need no
  hand-sized canvas), runs spec.steps in order against a shared Scene, snapshots
  the movie after each movie-affecting (non-cell-domain) step when
  save_intermediates is set, honors until=<stage name> (fail-fast on an unknown
  stage), and finalizes. Exported as minian.simulation.simulate.
* recording.py: finalize() now always emits observed at the sensor FOV, cropping
  a still-canvas-sized movie left by a partial build (until= before brain_motion)
  for consistency with the cropped footprints.

Tests (+7): minimal spec end-to-end (integer-count observed at the sensor FOV),
seed determinism, automatic motion-margin sizing, until= early stop + its
snapshot set, the full movie-stage snapshot keys with stage("sensor") == observed
and cell-domain steps excluded, no-snapshots default, and unknown-until ValueError.

Doc-stays-true: spec doc §7 stage table updated to the real step.name keys
(neuropil/brain_motion/sensor, not with_*/observed) and the movie-affecting-steps
snapshot rule. Step 6 complete: the pipeline now runs spec -> Recording end to
end. save/load -> Step 7, metrics -> 8, sweep -> 9.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Implement Recording.save/load (the prior NotImplementedError stubs) and add a
thin spec-hash cache wrapper, so a realistic spec simulates once and is reused
across the Step 9-10 parameter sweeps. simulate() itself stays pure — caching
composes on top.

recording.py:
  * save(path) writes a self-contained zarr directory: observed, a
    ground_truth/ subgroup (optional fields only when present, tracked in a
    gt_present attr), an optional snapshots/ subgroup, and a human-readable
    spec.json. The write is atomic (build {path}.tmp, then os.replace).
  * load(path) re-validates spec.json and checks its cache_key against the
    stamped spec_cache_key attr (ValueError on a stale or hand-edited spec).
    Snapshot coords are the trivial arange grid, rebuilt via _movie_dataarray.

cache.py (new): simulate_cached(spec, *, root=None) — hit loads, miss simulates
  and saves; cache_dir() honors $MINIAN_SIM_CACHE (default ~/.cache/minian/sim)
  and cache_path() keys by {cache_key}.zarr. Exported from the package.

test_caching.py (new, 12 tests): observed + dtype round-trip, every GroundTruth
  field (bool stays bool), optional-field presence vs None, heterogeneous
  snapshots (tissue canvas vs cropped FOV), empty (0,h,w) ground truth, atomic
  overwrite, spec-hash mismatch rejection, and simulate_cached miss-writes /
  hit-reads (call-counted) + env-var resolution.

No new deps (zarr/xarray already core). Full simulation suite: 122 passed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A standalone, dependency-light module scoring CNMF output against a GroundTruth,
shared by the unit tests, the Step 9-10 parameter-matrix tests, and both
training notebooks. Threshold-agnostic: callers supply the bounds and the fair
recall denominator (GroundTruth.detectable_subset). Inputs go through np.asarray
so CNMF's xr.DataArray output and plain ndarrays both work; unit-first dims
(unit_id, height, width) / (unit_id, frame) already match GroundTruth.

The five spec-§9 functions:
  * hungarian_match(A_est, A_true, *, metric="iou", energy_frac=0.9) -> Match
    Energy-quantile (CaImAn-style) binarization — each footprint's mask is the
    smallest set of brightest pixels holding energy_frac of its energy — then a
    pairwise IoU matmul and an optimal assignment via scipy linear_sum_assignment
    (maximize), with zero-overlap pairs dropped. Match exposes recall/precision
    (iou_threshold), mean_iou, pairing, matched_pairs, n_est/n_true.
  * trace_pearson(C_est, C_true, pairing) -> per-pair Pearson (nan if constant).
  * spike_precision_recall(..., tol_frames=2, spike_thresh=0.0) -> SpikeScore,
    pooled greedy nearest-within-tolerance spike matching.
  * shift_rmse — pure RMSE over (frame, 2); docstring flags the sign convention
    (a correction estimate is the negation of GroundTruth.shifts).
  * field_pearson — flattened, scale/offset-invariant field-shape correlation.

test_metrics.py (new, 16 tests): identical / disjoint / shuffled-order / unequal-
count matching, thresholdable partial-overlap IoU, metric + energy_frac guards,
xarray parity, trace Pearson (identical / anti / constant), spike exact / within-
tol / beyond-tol / extra-estimate, shift RMSE, and field Pearson invariance.

No new deps (scipy already core). Full simulation suite: 138 passed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A thin, standalone parameter-sweep generator over dotted-path overrides — the
single primitive shared by the Step 10 @parametrize correctness grids and the
benchmark harness that collects metrics into a tidy DataFrame. simulate() never
depends on it.

sweep(base, axes) yields one validated SweptSpec per point in the Cartesian
product of the axes. Three path forms: nested model (acquisition.optics.na),
step-by-kind (steps.place_somata.density_per_mm2, indexing steps by the unique
kind §11 guarantees), and top-level (seed). Each combination is re-validated via
model_validate — rebuilding the discriminated steps union and re-running every
§11 cross-field validator — so an out-of-range axis (na=-1, soma > FOV) fails
fast at yield. An empty axes dict yields the base once with axes={}.

_set_path applies overrides immutably via chained model_copy and validates each
segment against model_fields *before* copying: pydantic's model_copy(update=...)
skips validation and silently accepts unknown keys, so an unchecked typo would
no-op rather than raise. Clear ValueErrors for unknown field / unknown step kind
/ scalar-descent / malformed steps. path.

SweptSpec(Spec) carries axes: dict = Field(exclude=True) — a genuine Spec
subclass (drops into simulate()/Recording unchanged), with axes kept out of
model_dump_json so cache_key() stays identical to the equivalent plain spec.
Sweeping never perturbs cache dedup, and the tag vanishes when a recording is
persisted.

test_sweep.py (new, 14 tests): product count + axis bookkeeping, all three path
forms, sibling-step isolation, tuple-valued axis, empty axes, base-not-mutated,
cache_key parity (axes excluded), invalid-value validation failure, the four
error paths, and an end-to-end simulate() of a yielded spec.

No new deps (itertools stdlib). Full simulation suite: 152 passed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tep 10a)

The first milestone of the additive demonstration suite (no test replacement
this PR — that is a later campaign). Where test_steps.py exercises each step's
build() in isolation, these run the composed simulate() pipeline end to end and
assert emergent physics a broken chain would violate — the foundation the
per-stage recovery demos (10b motion, 10c CNMF) stand on:

  * test_detectability_falls_with_depth — sweep the cell band down past a fixed
    focal plane; the geometric in-focus fraction falls 1.0 -> ~0 (monotone) and
    detectability tracks it down. Anchored on the in-focus fraction (rock-solid
    geometry); the detectable fraction is shot-noise-limited shallow, which is
    itself honest physics.
  * test_strong_vignette_concentrates_detection_centrally — a steep illumination
    falloff dims rim cells below the floor, so detection concentrates centrally.
  * test_bleaching_dims_later_frames — late-frame mean < early-frame mean.
  * test_static_fields_are_invariant_to_motion — same-seed recordings differing
    only in motion magnitude share identical sensor-frame vignette/leakage GT
    while their shift trajectories differ (the tissue/sensor frame separation).

Small, fixed-seed specs with decisive (not finely-calibrated) margins — a
capability demonstration, not the calibrated replacement suite.

No new deps, no markers, touches nothing existing. Full simulation suite: 156 passed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The first test that feeds a simulated recording to a real minian pipeline stage
and checks it recovers exact known ground truth — the per-stage diagnosability
the end-to-end golden-number test cannot offer. Here the stage is
motion_correction.estimate_motion, validated against the injected
GroundTruth.shifts.

test_estimate_motion_recovers_true_shifts: a 30 s, 128 px, ~107-cell recording
with a 4 µm motion walk -> estimate_motion -> align to GT (negate + re-reference
to frame 0, matching minian's correction-sign convention) -> shift_rmse < 0.5 px
(observed ~0.26), plus a per-axis trajectory-correlation sanity check. Sub-pixel
recovery, deterministic, ~46 s.

slow-marked and off the per-PR path: motion estimation needs a realistic,
minutes-scale recording with enough texture/frames to converge — a 1 s clip
would not exercise it. pyproject registers the `slow` marker and adds
-m "not slow" to addopts so the default run stays fast (run slow demos with
`pytest -m slow`); only the new tests are affected. Deliberately generous bound
(~2x observed), a capability demonstration rather than the calibrated
replacement suite (a later PR).

Default suite: 156 passed / 1 deselected. Slow test: 1 passed in ~46 s.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The headline per-stage test: feed a simulated recording to minian's real CNMF
chain and check it recovers the known detectable footprints — the direct,
diagnosable comparison against exact ground truth the end-to-end golden-number
test cannot offer. Completes Step 10's additive demonstration suite
(10a structural self-consistency, 10b motion recovery, 10c CNMF recovery).

test_cnmf_recovers_detectable_footprints: a 60 s, 128 px, ~50-cell still
recording -> the full preprocessing -> initialization -> spatial/temporal-update
chain (a faithful, minimal transcription of the pipeline notebook) ->
hungarian_match(A_est, detectable.A_observed) -> assert recall >= 0.7,
mean_iou >= 0.5, precision >= 0.4. Observed ~0.86 / 0.72 / 0.75; bounds are
generous and robust to the BLAS/numerical-library drift the golden CNMF sums are
sensitive to — a capability demonstration, not the calibrated replacement suite
(a later PR). Recovery is matched against detectable_subset() (in-focus,
above-noise-floor cells), so sweeping physics out of range is not scored as a
minian failure.

_run_cnmf notes: preprocessing runs on uint8 (OpenCV median denoise), the
seeds/CNMF stage on float (numba); the save_minian calls between update steps are
load-bearing (the zarr round-trips concretize dask chunk shapes — skipping them
surfaces "shape is None"). Runs under a module-scoped LocalCluster + tmp_path
intermediate dir.

slow-marked and guarded by importorskip("pymetis"): the CNMF chain needs pymetis
(no Windows wheel today), so this skips on a stock Windows box and activates
automatically wherever pymetis is installed (CI Linux, conda) and once the
in-flight pymetis-replacement lands — no Windows special-casing to remove. The
slow marker was registered in Step 10b, so no pyproject change here.

Verified: passes in ~90 s in a pymetis env; skips cleanly on Windows; default
suite 156 passed / 1 skipped / 1 deselected.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A miniscope objective collects light over a cone whose solid angle scales as
NA^2, so a low-NA scope is fundamentally dimmer, not just blurrier -- an effect
the optics model omitted (brightness depended only on the defocus peak drop and
scatter attenuation, never on NA).

- Add Optics.collection_efficiency (proportional to NA^2) and fold it into
  Acquisition.cell_optics brightness and the observed-footprint gain in
  CellOpticsStep (degrade_footprint's factor renamed attenuation -> gain: now
  scatter attenuation x collection efficiency). It is a flat,
  focal-plane-independent light-loss, so the intensity-conservation invariant
  still holds, and it propagates to detectability (lower NA -> fewer detectable
  cells).
- The absolute proportionality constant is absorbed into the sensor step's
  photons_per_unit exposure scale; the structural and recovery tests that pin a
  detectability regime scale photons_per_unit by 1/NA^2 so the recordings (and
  thus their thresholds) are unchanged.
- Add test_cell_optics_brightness_scales_with_na_squared; update the
  surface-brightness and defocus-conservation expectations accordingly.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
First training notebook (minian/notebooks/training/01_anatomy), following #306's
self-contained-bundle layout: build a 1-photon miniscope recording forward from
its physics -- the inverse of the analysis pipeline -- as an interactive
construction (understand -> explore with sliders -> commit, one stage at a time).

WIP: the scope (optics + sensor), placing somata, and calcium activity stages are
in; optics degradation, render (first movie), background fields, motion, and the
sensor follow. Generates its own data via minian.simulation, so it runs offline.
Interactive viz uses ipywidgets + matplotlib (mediapy for the movie stages). No
notebook CI on this branch yet -- it will run under #306's notebook-discovery
test once that lands and this branch rebases onto it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 4, 2026

@daharoni
Copy link
Copy Markdown
Contributor Author

daharoni commented Jun 4, 2026

Please ignore size and test coverage for now....

@sneakers-the-rat
Copy link
Copy Markdown
Contributor

Please ignore size

lmao okay then.

I actually think something like this would be great as an independent package (that minian then imports) - we need good simulated data generation for basically all the analysis projects, and they all do it differently and with varying degrees of quality. having one shared lib that can do that would be super useful for testing and teaching with minian, calab, cala, and even mio.

"from matplotlib.colors import LogNorm\n",
"from scipy.ndimage import gaussian_filter\n",
"from ipywidgets import interact, FloatSlider, IntSlider\n",
"import mediapy\n",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing dep, not sure that we need another dep for this? don't we already show videos somehow?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

showing videos in notebooks in minian works really really poorly currently (slow and jumpy). I was seeing if mediapy happens to work better and it doesn't seem too.

@daharoni
Copy link
Copy Markdown
Contributor Author

daharoni commented Jun 4, 2026

I actually think something like this would be great as an independent package (that minian then imports)
Yea, I completely agree. We will see where this goes.

daharoni and others added 2 commits June 4, 2026 17:12
Rename soma_footprint -> neuron_footprint and add a `morphology` selector
for the two GCaMP targeting variants:

- "soma" (default): the existing lumpy disk, soma-targeted GCaMP. Bit-for-bit
  identical to the old footprint (dendrites are stamped only after the soma's
  RNG draw, so the soma stream is untouched).
- "cytosolic": the soma plus a few tapering proximal dendrites (standard
  GCaMP6/7/8), graded dimmer and thinner toward the tip so scatter, defocus,
  and the noise floor erase them first.

Replaces the dead PlaceSomata.n_neurite_stubs reservation with morphology +
n_dendrites / dendrite_length_um / dendrite_width_um. New helpers
_stamp_disk / _stamp_dendrites render dendrites in local bounding boxes (cheap
at high cell counts). Tests cover cytosolic reach/grading and soma determinism.

Anatomy notebook (Stage 1a sandbox): GCaMP type + dendrite params are now
editable cell settings (not sliders); footprints are generated once on a
sensor-independent 0.5 um grid and rescaled, so optics sliders never regenerate
shapes and magnification/pitch only zoom the same cells.

Document the reasoning in-source: a "pixel-limited, not diffraction-limited"
note on Optics.diffraction_sigma_um and a "shape is physical, grid is sampling"
note on neuron_footprint, cross-linked.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Field curvature (Petzval): a miniscope has no room for a field flattener, so
the in-focus surface is a shallow bowl, not a plane. Add
Optics.field_curvature_radius_um (µm; None = ideal flat field) and a Layer-2
helper Optics.focal_curvature_shift_um(r) = R - sqrt(R^2 - r^2) ~ r^2/(2R).
CellOpticsStep now gives each cell its own focal depth: the central plane minus
the sagitta at its distance from the optical axis (canvas center), so off-axis
cells focus shallower and blur toward the edges. The curvature over one soma is
negligible vs the ~mm radius, so cells are evaluated at their center (no
footprint warping). Default None keeps every existing preset/test unchanged.

Perf: degrade_footprint now computes the optical blur only within the cell's
bounding box, grown by the PSF truncation radius (4*sigma, beyond which the
Gaussian is exactly zero). Bit-identical to the full-canvas filter, but the
optics step is ~4-11x faster (the win grows with canvas size, since the old
cost scaled with area and the new one scales with cell size).

Anatomy notebook (Stage 1a sandbox): add a field-curvature slider (default
2.5 mm, realistic) and draw the curved focal surface + DOF band in the side
view; soften the proximal-dendrite waviness (wavy, not curly).

Tests: focal_curvature_shift_um (flat/on-axis/grows-with-r, rejects <= 0) and
an off-axis-cells-blur step test. Full non-slow simulation suite: 160 passed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread minian/simulation/spec.py
_DOMAIN_RANK: dict[str, int] = {"cell": 0, "tissue": 1, "motion": 2, "sensor": 3}


class StepSpec(_Base):
Copy link
Copy Markdown
Contributor

@sneakers-the-rat sneakers-the-rat Jun 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do step specs need to be different than the steps themselves? or, why are they in a different place? it seems like this might be for serialization/caching's sake, but it doesn't look like any of the steps bear any data on their own, so they would still have the same fields -> be just as serializable/hashable. just trying to get my head around 'what does a step consist of' and it seems like it's a step class, a stepspec, some additional fields in the ground truth, some additional fields in the scene, and maybe more? i don't see a declarative relationship between a stepspec and its step, just dynamically via the build method, so that would be nice to have.

I think best case for a design here would be to have the steps be separable and composable, and to do that would need to have each step be somewhat self contained and not bleed into either one another or the containing classes like scene, but as-is there is a bit of crosstalk/undeclared dependencies like e.g. the tissue.RenderStep depends on cell.CellOpticsStep which depends on cell.PlaceSomataStep, but that wouldn't be obvious unless you read the code.

the type annotations between a Step and its Spec need to use generics btw, because currently they are all receiving the base Spec class, and so all the attribute accesses are untyped, but if each step specifies its spec class with a typevar (or just has its spec as its attrs) then it would be possible to correctly resolve those types.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one way of handling dependencies would just be to declare them: like if CellOpticsStep depends on PlaceSomataStep, then a validator on the steps list could check that there is one of the necessary step instances prior to it in the list.

Comment thread minian/simulation/spec.py
Comment on lines +104 to +118
field_curvature_radius_um: float | None = Field(
default=None,
description="Petzval field-curvature radius, µm (typical miniscope ≈ 2000–3000). "
"Off-axis cells focus *shallower* by the spherical sagitta; None = ideal flat "
"field. A miniscope has no room for a field flattener, so this is usually finite.",
)

@field_validator("field_curvature_radius_um")
@classmethod
def _check_curvature(cls, v: float | None) -> float | None:
if v is not None and v <= 0:
raise ValueError(
f"field_curvature_radius_um ({v}) must be > 0, or None for a flat field."
)
return v
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could just be

from typing import Annotated as A
from annotated_types import Gt

class Optics(_Base):
    # ...
    field_curvature_radius_um: A[float, Gt(0)] | None

daharoni and others added 2 commits June 4, 2026 20:34
Acquisition / Optics now describe a real UCLA Miniscope V4 and name the focus
geometry clearly:

- Rename Optics.focal_plane_um -> Acquisition.focal_depth_in_tissue_um (the depth
  of the focal plane below the tissue surface, same coordinate as each cell's z;
  "auto" -> median cell depth). It moves off Optics because it is experiment
  geometry, not a lens property. The forward model is unchanged.
- Add Acquisition.front_working_distance_um (V4 ~700): an informational scope spec
  for surgery/implant planning. Nothing in the simulation reads it.
- Optics.depth_of_field_um now defaults to "auto", derived from NA via
  Optics.resolved_depth_of_field_um (~n*lambda/NA^2) instead of a hardcoded value;
  a number still overrides. DOF is set by the optics, so it shouldn't be a free knob.

Anatomy notebook commit cell is now the actual V4: Python480 CMOS at 608x608,
4.8 um pitch, effective NA 0.30, GCaMP, field curvature 2.1 mm, front working
distance 700 um, focal depth 200 um into tissue, DOF auto (+/-7.8 um).

Updated every call site (steps, recording/spec/structural/recovery tests, the
notebook, and the local preview scripts) and the design doc. Full non-slow
simulation suite: 161 passed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ample_neurons

The footprint generator now models whole neurons (soma + proximal dendrites for
cytosolic GCaMP), so the soma-specific names no longer fit. Rename the placement
step and its spec/discriminator throughout:

  PlaceSomata     -> PlaceNeurons
  PlaceSomataStep -> PlaceNeuronsStep
  place_somata    -> place_neurons   (step name / kind / until= key)
  sample_somata   -> sample_neurons

Kept soma_radius_um and the "soma"/"cytosolic" morphology values, which genuinely
refer to the cell body. Pre-release, so no back-compat shim.

Also factor the *distribution* half of the step into a public sample_neurons()
(centers + per-cell SNR, no footprint stamping). PlaceNeuronsStep delegates to it
then stamps; a test pins it draw-for-draw identical to the fused step. This lets
the anatomy notebook visualize placement at full FOV without paying for ~hundreds
of footprint paints.

Rebuild the notebook's Stage 2 widget on the new seam: full committed FOV (no
preview shrink), top + side views with true-radius disks colored by depth, an SNR
histogram, and a min_distance spacing knob. No optics and no footprint imagery in
Stage 2 -- crisp footprints carry no depth cue, so degradation is held back for
the optics stage. Prose corrected: planted cell bodies stay well separated at
realistic density; real crowding emerges from optical blur next stage.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
daharoni and others added 5 commits June 4, 2026 23:15
…bserved reveal

Expand the anatomy notebook's place_neurons section into a full
explore -> commit -> "see it through the scope" arc:

- Presets (CA1 thin layer ~150um, Volumetric 50-200um slab) via a ToggleButtons
  selector that drives the sliders; CA1 is the clean-run default.
- GCaMP morphology selector (soma vs cytosolic). It does not move the distribution
  dots (those mark cell bodies) but feeds the commit and shows its effect in the
  optics reveal, where cytosolic's thin dendrites are erased first.
- Slider-driven, idempotent commit: whatever you dialed in (preset + tweaks +
  morphology) is what gets imaged; re-running replaces the prior placement.
- New ending cell pushes the committed placement through the optics
  (place_neurons + optics, no calcium -- degradation is purely spatial) and shows
  A_planted -> A_observed: full-FOV pair on a shared gamma-stretched scale (the
  field is only a few percent as bright) plus a per-cell zoom isolating the blur.
  Teaches depth-driven dimming/defocus AND field curvature (only a central island
  stays in focus, since the V4 has no field-flattener and DOF is just +/-7.8um).

Focus is now "auto" in the V4 commit (tracks the placed layer's median depth, like
turning the focus knob), so the presets stay sensibly focused; the build() helper
gains an `extra` arg to preview optics without committing the step.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
"auto" focal depth previously resolved to the median cell depth, which ignores
field curvature: off-axis cells focus shallower, so a median-of-z plane left most
of the field defocused. Defocus is sigma = NA*|z - focal_eff| with
focal_eff = focal - shift(r), i.e. NA*|(z + shift(r)) - focal| -- linear in each
cell's effective depth e = z + shift(r). The focal that minimizes total defocus is
therefore exactly the MEDIAN of the effective depths, which folds curvature in by
construction and pulls the plane deeper to claw back off-axis cells.

resolve_focal_plane() gains optional (optics, axis_yx); when a curvature radius and
optical axis are supplied it returns median(z + shift(r)), else the prior median z
(backward compatible). CellOpticsStep computes the axis first and passes both. New
test covers the curvature path; full non-slow sim suite green (164 passed).

Effect on the anatomy notebook's CA1 preset: focus moves 150 -> 186 um and in-focus
cells go 73 -> 176 / 852.

Also tune the Stage 2 presets: CA1 depth 140-160 um (was 145-155), Volumetric
density 1200/mm2 (was 500), and label the optics reveal with the resolved focus
(shown deeper than the median depth, illustrating the curvature compensation).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… soma diameter

PlaceNeurons.density_per_mm2 -> density_per_mm3. Cell count is now
round(density_per_mm3 * FOV_area * thickness), the depth-range thickness floored at
one soma diameter (2*soma_radius_um) so a thin or strictly planar (lo==hi) layer
still yields cells rather than zero. This is the physically fundamental quantity: a
thicker slab of tissue holds proportionally more neurons, and the count now scales
with the imaged depth.

sample_neurons() computes the floored thickness and volume; spec field + docstrings
updated. All call sites migrated: tests converted with count-preserving densities
(rho_vol = rho_areal * 1000 / thickness_um, so existing cell counts and the exact
count assertions are unchanged), the sweep override path string, and test_sweep.
Full non-slow sim suite green (164 passed).

Anatomy notebook Stage 2: density slider + presets are now volumetric (CA1
45000/mm3 over a 20 um layer -> 852 cells; Volumetric 8000/mm3 over a 150 um slab
-> 1136 cells, the same counts as before), and the prose explains the volumetric
count rule and the soma-diameter floor.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
neuron_footprint computed its hypot/threshold/irregularity-noise over the
whole canvas per cell, while a cell occupies only a small patch (the soma
reaches at most radius*(1+irregularity) from the center). degrade_footprint
and the dendrite disk stamps were already windowed; the soma paint was the
one that was missed.

Compute the soma inside its bounding box and write that patch into the
full-canvas array; dendrite stamping is unchanged (it self-windows). For the
852-cell CA1-cytosolic preset on a 608x608 canvas this takes the
place_neurons + optics build from ~31s to ~9.6s (3.3x), with the soma's own
full-canvas cost dropping from ~23s to ~0.1s.

The windowed irregularity noise draws fewer randoms, which shifts the
downstream CellActivity RNG stream (acceptable: unreleased, no exact-pixel
goldens). One finely-tuned structural test dipped below the noise floor as a
result; in_focus stayed 55/55 and the central-clustering property holds, so
re-tuned its photons_per_unit 110 -> 1000 to exercise a non-trivial
detectable population again.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…s reveal

Render the Stage 2 "A_planted -> A_observed" panels with the GCaMP green LUT
(matching the imaging plots earlier in the notebook) instead of magma, and
scale each panel to its own peak rather than a shared gamma stretch. A_observed
is now scaled to its max footprint brightness, so the faint observed field is
lifted to full visibility and the top row reads as shape (which cells stay
sharp) rather than brightness. The real dimming stays reported as the peak-%
in the title and bites later at the sensor noise floor.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants