Skip to content

docs: document BaseGeoDataset inheritance interface and add custom dataset example notebook#169

Merged
taddyb merged 3 commits intoDeepGroundwater:masterfrom
taddyb:geodata
Mar 20, 2026
Merged

docs: document BaseGeoDataset inheritance interface and add custom dataset example notebook#169
taddyb merged 3 commits intoDeepGroundwater:masterfrom
taddyb:geodata

Conversation

@taddyb
Copy link
Collaborator

@taddyb taddyb commented Mar 19, 2026

Summary

  • Adds comprehensive NumPy docstrings to all six abstract methods in BaseGeoDataset, spelling out shapes, semantics, and hidden contracts that were previously only discoverable by reading both built-in implementations in parallel
  • Documents every field of RoutingDataclass with shape, units, and usage notes
  • Adds examples/custom_geodataset.ipynb — a step-by-step walkthrough of how to implement a custom BaseGeoDataset subclass, using MERIT as the concrete example

Motivation

A new user wanting to route their own hydrofabric had no documented path for implementing a custom dataset. The six abstract methods had one-line docstrings and the _build_common_tensors return type (tuple[Tensor, Tensor, Tensor, dict[str, Tensor]]) gave no indication of what each element represented. Several contracts were entirely implicit:

  • _init_training must set four specific instance attributes (gage_ids, observations, gages_adjacency, obs_reader) that the base class collate_fn depends on
  • normalized_spatial_attributes must be transposed relative to spatial_attributes — getting this wrong produces a silent shape error in the KAN
  • length must be in metres — MERIT stores lengthkm so the ×1000 conversion is easy to miss
  • Unused geometry tensors (top_width, side_slope) must be torch.empty(0), not None

Changes

src/ddr/geodatazoo/base_geodataset.py

Full NumPy docstrings on all six abstract methods covering: parameter types and shapes, return value shapes and semantics, the training-mode instance attribute contract in _init_training, the three-branch priority order for _init_inference, and the transposition invariant in _build_common_tensors.

src/ddr/geodatazoo/dataclasses.py

Class-level docstring on RoutingDataclass documenting all 14 fields with shape (using N for active segments), units, and when to use torch.empty(0) vs a real tensor.

examples/custom_geodataset.ipynb

Step-by-step notebook structured as:

  1. Architecture diagram showing training vs inference data flow
  2. Data requirements table (attribute store, flowpath GDF, adjacency zarr)
  3. Each abstract method explained individually with the MERIT implementation and annotated gotchas
  4. The complete assembled class in one cell
  5. Wiring pattern to drop a custom class into the standard DataLoader
  6. Common mistakes table (six failure modes with symptom and fix)

Test plan

  • Verify notebook renders correctly in JupyterLab
  • Confirm ruff check and mypy pass on the changed source files
  • Check that existing unit tests still pass: uv run pytest

🤖 Generated with Claude Code

@taddyb taddyb merged commit ab725d3 into DeepGroundwater:master Mar 20, 2026
3 of 4 checks passed
@taddyb taddyb deleted the geodata branch March 20, 2026 02:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant