Skip to content

feat(packaging): ship notebooks in-package and fetch demo data from Zenodo#317

Open
daharoni wants to merge 4 commits into
v2-integrationfrom
feat/notebook-data-packaging
Open

feat(packaging): ship notebooks in-package and fetch demo data from Zenodo#317
daharoni wants to merge 4 commits into
v2-integrationfrom
feat/notebook-data-packaging

Conversation

@daharoni
Copy link
Copy Markdown
Contributor

@daharoni daharoni commented Jun 2, 2026

What

Removes the ~700 MB of demo binaries (demo_movies/, demo_data/) from the repo and replaces them with on-demand fetching from Zenodo, and ships the tutorial notebooks inside the installed package.

Changes

  • minian/data/ - new fetch/cache API (fetch, dataset_path, datasets) backed by pooch, a per-file registry with SHA256 checksums (_registry.py), and a minian-data CLI (list/download/path). Datasets are hosted on Zenodo (one deposit/DOI each) and verified on every access. MINIAN_DATA_DIR is an offline escape hatch.
  • minian/notebooks/ - the pipeline and cross-registration notebooks now live in-package as self-contained bundles (notebook + assets/ + README.md). minian-notebooks CLI copies a bundle out to a working directory.
  • minian/install.py - reduced to a thin back-compat shim delegating to the two new CLIs.
  • Tests - notebook execution tests go through a shared _notebook.py helper; heavy end-to-end runs are marked slow (deselected by pdm run test, run in full by test-all/CI). Golden-value assertions unchanged.
  • CI/packaging - a prefetch-data job caches datasets once per OS; pdm-backend ships notebooks in the wheel; nbstripout pre-commit hook + notebooks-stripped CI keep committed notebooks output-free.
  • scripts/ - prefetch_data.py (CI cache priming) and zenodo_manifest.py (maintainer staging tool).

Notes

  • Net working-tree change: -698.7 MB of binaries removed. Git history still contains the old blobs (a fresh clone stays large until history is rewritten separately).
  • Both datasets are published on Zenodo and wired up in _registry.py.

daharoni and others added 4 commits June 1, 2026 22:41
…enodo

Move the pipeline and cross-registration notebooks into minian/notebooks/
so they ship inside the wheel, and pull demo data on demand from Zenodo
instead of committing ~700 MB of binaries to the repo.

- minian.data: pooch-backed, checksum-verified fetch() with a Zenodo
  registry (pipeline-demo, cross-reg-sessions) and a minian-data CLI
- minian-notebooks CLI copies bundled notebooks out of the package;
  minian-install --notebooks/--demo kept as deprecated aliases
- notebooks drop the sys.path hacks and reference assets/ relatively
- demo_movies/ and demo_data/ binaries removed; stub READMEs point at
  the data registry
- tests: discover and smoke-execute bundled notebooks; the golden
  pipeline and cross-reg runs are marked slow and resolve data via fetch()
- CI: prefetch+cache demo-data job; default pytest deselects slow while
  the full matrix runs notebooks via test-all; lint verifies notebooks
  are output-stripped; pre-commit nbstripout hook
- deps: add pooch

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add pooch to pdm.lock (it was added to dependencies for on-demand demo
data fetching) so `pdm lock --check` passes, and clear the stale
execution_count from pipeline.ipynb so the notebooks-stripped check passes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The test matrix runs on windows-latest, but prefetch-data only ran on
ubuntu/macos and the cache paths omitted the Windows location, so Windows
test jobs had no primed cache. Add windows-latest to the prefetch matrix
and include the Windows pooch cache dir (AppData/Local/minian/minian/Cache)
in both cache steps.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 2, 2026

Codecov Report

❌ Patch coverage is 19.09091% with 178 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
minian/notebooks/cli.py 0.00% 94 Missing ⚠️
minian/data/cli.py 0.00% 46 Missing ⚠️
minian/data/__init__.py 66.10% 20 Missing ⚠️
minian/install.py 0.00% 18 Missing ⚠️

📢 Thoughts on this report? Let us know!

@daharoni daharoni linked an issue Jun 4, 2026 that may be closed by this pull request
@daharoni daharoni changed the base branch from modernize-minian to v2-integration June 4, 2026 19:27
@daharoni daharoni marked this pull request as ready for review June 4, 2026 19:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Notebook + demo-data packaging

1 participant