docs: document BaseGeoDataset inheritance interface and add custom dataset example notebook#169
Merged
taddyb merged 3 commits intoDeepGroundwater:masterfrom Mar 20, 2026
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
BaseGeoDataset, spelling out shapes, semantics, and hidden contracts that were previously only discoverable by reading both built-in implementations in parallelRoutingDataclasswith shape, units, and usage notesexamples/custom_geodataset.ipynb— a step-by-step walkthrough of how to implement a customBaseGeoDatasetsubclass, using MERIT as the concrete exampleMotivation
A new user wanting to route their own hydrofabric had no documented path for implementing a custom dataset. The six abstract methods had one-line docstrings and the
_build_common_tensorsreturn type (tuple[Tensor, Tensor, Tensor, dict[str, Tensor]]) gave no indication of what each element represented. Several contracts were entirely implicit:_init_trainingmust set four specific instance attributes (gage_ids,observations,gages_adjacency,obs_reader) that the base classcollate_fndepends onnormalized_spatial_attributesmust be transposed relative tospatial_attributes— getting this wrong produces a silent shape error in the KANlengthmust be in metres — MERIT storeslengthkmso the ×1000 conversion is easy to misstop_width,side_slope) must betorch.empty(0), notNoneChanges
src/ddr/geodatazoo/base_geodataset.pyFull NumPy docstrings on all six abstract methods covering: parameter types and shapes, return value shapes and semantics, the training-mode instance attribute contract in
_init_training, the three-branch priority order for_init_inference, and the transposition invariant in_build_common_tensors.src/ddr/geodatazoo/dataclasses.pyClass-level docstring on
RoutingDataclassdocumenting all 14 fields with shape (usingNfor active segments), units, and when to usetorch.empty(0)vs a real tensor.examples/custom_geodataset.ipynbStep-by-step notebook structured as:
Test plan
ruff checkandmypypass on the changed source filesuv run pytest🤖 Generated with Claude Code