Modernize minian - packaging and api updates by sneakers-the-rat · Pull Request #301 · miniscope/minian

sneakers-the-rat · 2026-05-29T00:01:24Z

This is a combined PR, where we are collecting smaller changes in this branch and reviewing them as we go. We are trying to update minian to make it runnable on more recent versions of python, and make use of more recent versions of its dependent packages. This should resolve most outstanding issues with installation!

It is currently WIP.

Contains PRs:

Python >=3.10 compat, update packaging #294

TODO:

check if Adjust to run with newer dependencies #224 has any fixes that we missed

param dropped the lowercase Parameter-type aliases (param.list, param.boolean, etc.) some time ago; only the PEP-8 capitalized forms remain. Triggered an AttributeError when CNMFViewer(...) was constructed (cell 102 of pipeline.ipynb).

- NetworkX 2.7 renamed `from_numpy_matrix` -> `from_numpy_array` and `from_scipy_sparse_matrix` -> `from_scipy_sparse_array`; the old names were removed in NetworkX 3.0. - NumPy 1.20 deprecated the dtype aliases `np.int`, `np.float`, `np.bool`, etc.; NumPy 1.24 removed them. Use Python builtins (`int`, `float`) instead, which give the same int64/float64 on 64-bit platforms. Surfaced at cell 51 of pipeline.ipynb (seeds_merge -> label_connected) on a modern NetworkX/NumPy stack.

… with pd.concat pandas 2.0 (April 2023) removed `DataFrame.append()` after deprecating it in 1.4. Fold the one-row append into the surrounding pd.concat call by wrapping the new row's dict in a single-row DataFrame. Triggered an AttributeError at cell 52 of pipeline.ipynb (initA -> seed correlation construction).

…m_delay_optimize Modern xarray passes `flush=True` (and may pass other kwargs in future) through the dask optimization chain, but custom_arr_optimize's signature rejects them. The sibling function custom_delay_optimize already takes `**kwargs` for the same reason; this brings the two to parity. Triggered `TypeError: custom_arr_optimize() got an unexpected keyword argument 'flush'` when save_minian(...) called ds.to_zarr() (cell 9 of pipeline.ipynb).

…gufunc output `update_spatial_block` is decorated `@darr.as_gufunc(output_dtypes=float)` but its body always returns a `sparse.COO`. The meta mismatch was tolerated by older dask but breaks in modern dask, which dispatches by chunk meta: `darr.block(A_new.tolist())` sees a mix of sparse-meta chunks (the `darr.array(sparse.zeros(...))` fallback) and dense-meta chunks (from the gufunc), and the sparse backend rejects the dense ones with `ValueError: All arrays must be instances of SparseArray`. Force sparse meta on each `update_spatial_block` result via `map_blocks(sparse.COO, ...)`. The downstream code already calls `.map_blocks(lambda a: a.todense(), ...)` on the assembled array (line ~413), confirming the assembled result is expected to be sparse-backed. Triggered at cell 61-62 of pipeline.ipynb (update_spatial(...)).

…atrix(coo) `A_inter` and `A_usum` are pydata/sparse arrays, so the expression `A_inter / (A_usum - A_inter) > jac_thres` produces a `sparse.COO` of bool. Passing that to `scipy.sparse.csc_matrix(...)` makes scipy fall back to `np.asarray(coo)`, which triggers pydata sparse's `__array__`. Pydata sparse refuses to auto-densify by default (`AUTO_DENSIFY=False`) — a guardrail against accidental memory blow-ups — so scipy can't construct the matrix and raises: RuntimeError: Cannot convert a sparse array to dense automatically. To manually densify, use the todense method. Use pydata sparse's `.tocsc()` method instead — it returns a scipy `csc_matrix` directly, no dense intermediate. Triggered at cell 71 of pipeline.ipynb (update_temporal(...)).

xarray tightened up boolean-indexer handling: using a *dask* boolean array as an indexer produces an output of unknown shape, which dask/xarray can't model cleanly, so xarray now refuses outright with: KeyError: 'Indexing with a boolean dask array is not allowed. This will result in a dask array of unknown shape. Such arrays are unsupported by Xarray. Please compute the indexer first using .compute()' `umask` is per-(height, width) here — small enough that .compute() is essentially free. Add a brief comment explaining the rationale. Surfaces inside CNMFViewer.update_AC when the callback fires (cell 102+, during interactive use of the demo pipeline).

…xed children Modern HoloViews (>= 1.16) enforces type uniformity inside NdLayout. The two children here are heterogeneous — `hv_pul` is an NdOverlay (the datashaded simulated-pulse curves) and `hv_A` is an Image (the spatial footprint). Older HoloViews tolerated mixed types; recent versions raise: AssertionError: NdLayout must only contain one type of object, not both Image and NdOverlay. Replace the inner NdLayout with a permissive `Layout` (built via `+`) and preserve the per-panel labels via `.relabel()`. The `Current Unit` kdim is dropped because it was only used as a header label, not as a navigation dimension. Verified that the other two `hv.NdLayout` call sites in visualization.py have type-uniform children (AdjointLayouts in VArrayViewer; Overlay in visualize_spatial_update) and don't need this change. Surfaced at cell 71 of pipeline.ipynb during rendering of visualize_temporal_update output.

Commit b2372ce flattened the .opts(group=dict(...)) constructions in visualization.py, including the `opts_pts = dict(plot=..., style=...)` build in visualize_seeds. The downstream code one branch below still expects the old nested structure: opts_pts["style"]["color"] = "white" # KeyError: 'style' Once opts_pts is flat, the equivalent is a flat assignment: opts_pts["color"] = "white" Triggers when visualize_seeds is called with no mask (the `else:` branch at line ~1760).

HoloViews 1.16+ removed the .opts(style=..., plot=..., norm=...) options-group form. The library source was flattened in commit b2372ce, but four code cells in the canonical demo notebook still use the legacy form via the spread-dict spelling: opts = dict( plot=dict(height=..., width=..., colorbar=True), style=dict(cmap="Viridis"), ) ... .opts(**opts) Same removed wire format, just spelled indirectly. Flattening produces: opts = dict( height=..., width=..., colorbar=True, cmap="Viridis", ) Cells affected: 182, 185, 236, 238 (JSON cell indices).

…ple data `cnmfviewer.unit_labels` is itself an `xr.DataArray`. Modern xarray rejects DataArrays as the data half of a `(dims, data)` coord tuple because it's ambiguous whether to use the array's own coords/indexes or the new tuple's dims, raising: TypeError: Variable 'unit_labels': Using a DataArray object to construct a variable is ambiguous, please extract the data using the .data property. Use `.values` (numpy array) to make the intent explicit. Applied to all five assign_coords sites in this cell (A, C, S, c0, b0). Surfaces at cell 110 of pipeline.ipynb.

pymetis has zero Windows wheels on PyPI for any version (confirmed across cp310-cp314), and its from-source build requires MSVC + the METIS C library which pip can't get on its own. This blocks `pip install minian` on Windows entirely, which is a problem for the upcoming PyPI publish. Make the dependency optional: - Move `pymetis` to a new `[project.optional-dependencies]` `performance` extra. - Lazy-import in cnmf.py with an `_HAS_PYMETIS` flag. - Add a `_partition_graph(G, n_parts)` helper that uses pymetis when available and otherwise falls back to NetworkX's `greedy_modularity_communities` (pure Python, ships with networkx). Slower for very large graphs but produces a valid per-node partition label, which is all `graph_optimize_corr` needs — the partition quality affects compute chunking, not result correctness. - Defensive try/except around the NetworkX call to handle the `n_parts > number_of_connected_components` edge case. Install paths now: - `pip install minian` — works everywhere, uses NetworkX fallback. - `pip install 'minian[performance]'` — opt-in pymetis, faster on large graphs, requires platform with a working pymetis install (Linux/macOS via PyPI; Windows via conda-forge).

The cluster cell hard-codes `memory_limit="2GB"` per worker. This was fine on older dask (lax memory enforcement, would spill to disk and keep going), but modern dask kills any worker that exceeds the limit. Cells that force monolithic chunks — e.g. `xr.concat([...], "width") .chunk({"width": -1})` followed by `.max().compute()` — need a single chunk of the full width-slab in memory, which exceeds 2GB on the demo data and KilledWorkers the cluster out after 4 retries. Two changes: - Make `memory_limit` env-driven via `MINIAN_MEM_LIMIT`, mirroring the existing `MINIAN_NWORKERS` pattern. Docker images, workshop launchers, and individual users can override without editing the notebook. - Bump the default from "2GB" to "4GB". With `n_workers=4` this targets a 16GB-RAM laptop, which runs the demo pipeline top-to-bottom on modern dask. Smaller machines override the env var down. Memory is per-worker, not pooled — each dask worker is a separate process. The doc comment in the parameter cell calls this out so users configuring smaller machines understand the relationship between `MINIAN_NWORKERS` and `MINIAN_MEM_LIMIT`.

Plain string "msCam[0-9]+\.avi$" contains `\.`, which Python 3.12+ treats as `SyntaxWarning: invalid escape sequence`. Future Python versions will upgrade this to SyntaxError. The intent is clearly a regex literal, so prefix with `r` to make it explicit. The markdown cell documenting the parameter already uses `r"..."` in its example, so this brings the actual code into sync with the docs. (Same change should land in cross-registration.ipynb if it has the same pattern; verified it doesn't on this branch.)

…tashade() The modern HoloViews rasterizer doesn't consume `normalization` on `datashade()`; the call site has been emitting WARNING:param.datashade: Parameter(s) [normalization] not consumed by any element rasterizer. since at least HoloViews 1.16. The value passed ("linear") was the historical default in any case, so removing the kwarg is behavior- neutral and silences the warning. If a future call site needs non-linear normalization, the modern API is `shade(..., cnorm="log"|"eq_hist"|...)` rather than a kwarg on `datashade`.

…nstead dask added aggressive memory enforcement in distributed 2022.10. By default, when a worker exceeds 95% of `memory_limit` it is terminated; after 4 retries the task is marked failed (`KilledWorker`). This broke minian's canonical pipeline because operations like update_background and write_video transiently materialize the residual movie (Y - A·C) at full size — easily several GB even on the demo dataset — which exceeds the historical 2GB-per-worker default. Older dask (pre-2022.10) handled the same situation by spilling chunks to disk under memory pressure and continuing. The pipeline finished; it was just slower when memory-constrained. That's the behavior the canonical demo was originally tested against. Restore it by disabling the terminate threshold: dask.config.set({"distributed.worker.memory.terminate": False}) set just before the LocalCluster is created. Workers still pause and spill at the existing thresholds, they just don't get killed. The practical effect: bulk operations (update_background, write_video, etc.) complete on small per-worker memory budgets at the cost of disk I/O, exactly as the original pipeline was designed for. Combined with the env-driven memory_limit knob (previous commit), users with larger machines can opt in to less spilling by bumping MINIAN_MEM_LIMIT; users on workshop laptops get the original pipeline-finishes-no-matter-what behavior.

dep cooldowns in lockfile resolution!

…lockfile

The new lockfile-current check installed PDM unpinned, but exclude-newer in [tool.pdm.resolution] is only honored on PDM >= 2.26.9 (older versions silently ignore it), matching the floor already pinned in testandcov.yml. Also align checkout@v6 and Python 3.14 with the lint job, drop the no-op pip cache, and add the missing trailing newline. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

ensure the lockfile is up to date with the current pyproject.toml

Automatic weekly update of lockfile

Rework update_meta() for the zarr-only API on modernize-minian: - drop the removed netcdf/backend branching and call open_minian(f_path) with its current signature - write each variable to a temp .zarr store and swap, instead of overwriting in place while it is still lazily backed by that store - preserve dataset-level attrs and correct the meta_dict docstring (keys are dim names, values are directory levels) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Cover the two behaviors of update_meta: assigning metadata coordinates from the directory hierarchy without altering the underlying data, and only touching directories that match the pattern. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The os.walk unpacking no longer uses the file-name list, so bind it to `_` to make the intent explicit. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The keys are dimension names and the values are directory levels, not the reverse, as the code (and the toy_data tests) have always used. Fix the parameter description and the example to match. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Strengthen the coverage beyond "the function runs": - save several variables into one minian dir (as a real dataset does) and confirm all of them get the coords and reopen/merge cleanly - assert update_meta after the fact is identical to saving with meta_dict up front (data, coords, and attrs), via assert_identical - carry a non-empty variable attr through to prove it is preserved Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Address review feedback: instead of reading each variable and rewriting its entire zarr store to attach scalar metadata coordinates, append a coords-only dataset with `to_zarr(mode="a")`. This writes just the small coordinate arrays and leaves the data chunks in place. Side benefits: the read-while-overwrite hazard disappears (we never rewrite existing chunks), and variable attrs are preserved for free since the data variables are never touched. The multi-var equivalence test confirms the result is identical to save_minian(meta_dict=...). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

updated update_meta()

sneakers-the-rat · 2026-06-03T19:52:23Z

So let's set a closing scope on this - what else do we want in a 1.3.0 release that is focused on "the same minian as before, just runnable on more recent python versions" ?

Once we complete those tasks, we can merge this and make a v1.3.0 tag and split that off into a v1 branch in case we want to backport future features (I am not sure how necessary that will be, sorta depends on if we start getting issues requesting it/backwards compatibility issues with a 2.0) and then resume a normal PR workflow releasing >=2.0 that may have breaking changes. That can happen before we get control of the PyPI project, and we can make the PyPI release whenever we do, the tag will still be there.

Trying to prevent us from wandering too far making (good! desirable!) changes in this branch that go beyond the scope of a function-neutral update

daharoni · 2026-06-03T21:46:19Z

I think we maybe are good to go? Should we merge the pypi PR before merging to master?

Drop the one-time PyPI setup reminders (trusted-publisher registration, pypi environment creation) and the first-release version-decision notes now that minian is claimed on PyPI and the publish workflow is wired. Keep the durable bits: SCM-tag versioning, how to cut a release (cz bump / manual tag), and the workflow_dispatch dry run. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Register minian on PyPI: publishing workflow + metadata

# Conflicts: # README.md

chore: merge master into modernize-minian (resolve README conflict)

daharoni

I think we are possibly ready to merge to master and set this as v1.3

sneakers-the-rat · 2026-06-04T09:36:12Z

Let's go baby

wmau and others added 30 commits May 24, 2022 12:13

updated update_meta()

5a1257e

black

2798ed1

Merge branch 'master' into update-update_meta

c448873

Automated changes, black formatter

2efe764

switch to pyproject.toml, single or double source version

64f3a2e

getting things running, pyupgrade

817d263

pyupgrade docs

42bc0e0

about as far as i can get debugging new package apis

899eaf0

bump to 3.10, update lockfile

95a4f86

numba optional

5772ace

rm references to removed dask functions

1c54533

update holoviews calling patterns

b2372ce

fix ipython import

e97b4cd

whitespace

96a4a01

fix(visualization): param.list() -> param.List()

1eefb28

param dropped the lowercase Parameter-type aliases (param.list, param.boolean, etc.) some time ago; only the PEP-8 capitalized forms remain. Triggered an AttributeError when CNMFViewer(...) was constructed (cell 102 of pipeline.ipynb).

daharoni and others added 16 commits June 1, 2026 19:39

Merge pull request #314 from sneakers-the-rat/ci/dep-cooldown

bd6761b

dep cooldowns in lockfile resolution!

Merge remote-tracking branch 'origin/modernize-minian' into ci/check-…

12a2f1e

…lockfile

Merge pull request #312 from sneakers-the-rat/ci/check-lockfile

82052db

ensure the lockfile is up to date with the current pyproject.toml

Merge branch 'modernize-minian' into ci/update-lockfile

119a609

Merge branch 'modernize-minian' into feat/pypi-publish

e74333f

Merge pull request #313 from sneakers-the-rat/ci/update-lockfile

dc8aacd

Automatic weekly update of lockfile

Merge branch 'modernize-minian' into feat/pypi-publish

e0dcbe4

style: drop unused loop variable in update_meta

5c1f1da

The os.walk unpacking no longer uses the file-name list, so bind it to `_` to make the intent explicit. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

add an mtime test

df06432

Merge pull request #205 from miniscope/update-update_meta

b861b5a

updated update_meta()

daharoni and others added 4 commits June 3, 2026 14:51

Merge branch 'modernize-minian' into feat/pypi-publish

5ee986c

Merge pull request #305 from miniscope/feat/pypi-publish

074461d

Register minian on PyPI: publishing workflow + metadata

Merge remote-tracking branch 'origin/master' into modernize-minian

13baf1f

# Conflicts: # README.md

daharoni mentioned this pull request Jun 3, 2026

chore: merge master into modernize-minian (resolve README conflict) #322

Merged

Merge pull request #322 from miniscope/chore/sync-modernize-with-master

a459c2a

chore: merge master into modernize-minian (resolve README conflict)

daharoni marked this pull request as ready for review June 4, 2026 00:26

daharoni self-requested a review June 4, 2026 02:11

daharoni approved these changes Jun 4, 2026

View reviewed changes

daharoni mentioned this pull request Jun 4, 2026

SparseArray Error during "first spatial update" - parameter exploration #152

Open

sneakers-the-rat merged commit e76fd20 into master Jun 4, 2026
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modernize minian - packaging and api updates#301

Modernize minian - packaging and api updates#301
sneakers-the-rat merged 131 commits into
masterfrom
modernize-minian

sneakers-the-rat commented May 29, 2026 •

edited

Loading

Uh oh!

sneakers-the-rat commented Jun 3, 2026 •

edited

Loading

Uh oh!

daharoni commented Jun 3, 2026

Uh oh!

daharoni left a comment

Uh oh!

sneakers-the-rat commented Jun 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

sneakers-the-rat commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sneakers-the-rat commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

daharoni commented Jun 3, 2026

Uh oh!

daharoni left a comment

Choose a reason for hiding this comment

Uh oh!

sneakers-the-rat commented Jun 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

sneakers-the-rat commented May 29, 2026 •

edited

Loading

sneakers-the-rat commented Jun 3, 2026 •

edited

Loading