DeepGroundwater · taddyb · Mar 14, 2026 · Mar 14, 2026 · Mar 14, 2026 · Mar 14, 2026
diff --git a/.gitignore b/.gitignore
@@ -189,19 +189,24 @@ output/
 # Pykan model state
 model/
 
-# config classes
+# config classes (track templates and hydra settings)
 config/
+!config/example_config.yaml
+!config/templates/
+!config/hydra/
 
 # eval example data
 examples/eval/*.zarr
 
 # scratch notebooks
 examples/scratch
 
-# claude
-Claude.md
+# claude (keep .claude/ private, commit project-level CLAUDE.md)
 .claude/
 
+# built docs
+site/
+
 # misc
 .qodo/
 .ruff_cache/

diff --git a/CLAUDE.md b/CLAUDE.md
@@ -0,0 +1,170 @@
+# DDR — Distributed Differentiable Routing
+
+AI-agent context file. Committed to version control so every coding assistant
+(Claude, Copilot, Cursor, etc.) gets the same codebase orientation.
+
+---
+
+## Architecture
+
+DDR couples a **Kolmogorov-Arnold Network (KAN)** with **differentiable
+Muskingum-Cunge (MC) routing** to learn spatially varying river-routing
+parameters end-to-end via PyTorch autograd.
+
+1. **KAN** ingests catchment attributes and predicts three spatial parameters
+   per reach: Manning's *n*, *q_spatial*, and *p_spatial*.
+2. **Leopold & Maddock power law** converts depth to channel geometry:
+   `top_width = p_spatial * depth^(q_spatial + 1e-6)`.
+3. **MC routing** solves the linearized Saint-Venant equations on a trapezoidal
+   channel cross-section using a sparse-matrix solve (CSR format).
+4. Gradients flow from the loss (KGE, NSE, etc.) back through the routing
+   physics into the KAN weights.
+
+---
+
+## Module Map
+
+| Path | Role |
+|---|---|
+| `src/ddr/routing/mmc.py` | Core `MuskingumCunge` engine — sparse matrix solve, trapezoid velocity, parameter denormalization |
+| `src/ddr/routing/torch_mc.py` | PyTorch `nn.Module` wrapper (`dmc` class) |
+| `src/ddr/nn/kan.py` | KAN neural network for spatial parameter prediction |
+| `src/ddr/io/readers.py` | `StreamflowReader` for loading lateral inflows |
+| `src/ddr/io/functions.py` | Utility functions (downsampling, etc.) |
+| `src/ddr/validation/configs.py` | Pydantic config models (`Config`, `DataSources`, `Params`, `Kan`, `ExperimentConfig`) |
+| `src/ddr/validation/enums.py` | `GeoDataset` and `Mode` enums |
+| `src/ddr/validation/metrics.py` | Evaluation metrics |
+| `src/ddr/validation/plots.py` | Plotting utilities |
+| `src/ddr/geodatazoo/` | Dataset abstraction layer — `BaseGeoDataset`, `Merit`, `LynkerHydrofabric` |
+| `src/ddr/scripts_utils.py` | Shared helpers used by the scripts below |
+
+## Public API (`src/ddr/__init__.py`)
+
+```python
+from .routing.torch_mc import dmc        # Differentiable routing model
+from .nn import kan                       # KAN neural network
+from .io.readers import StreamflowReader as streamflow  # Data reader
+from .io import functions as ddr_functions              # Utilities
+from . import validation                  # Config, Metrics, plotting
+```
+
+---
+
+## Config Flow
+
+```
+Hydra YAML (config/) → OmegaConf DictConfig → validate_config() → Pydantic Config
+```
+
+- Hydra parses YAML from the `config/` directory.
+- `validate_config()` in `src/ddr/validation/configs.py` converts the
+  `DictConfig` to a typed Pydantic model.
+- Config supports `${oc.env:VAR_NAME,default}` interpolation for portable
+  paths across machines.
+
+---
+
+## Scripts
+
+| Script | Purpose |
+|---|---|
+| `scripts/train.py` | Training loop (`python scripts/train.py --config-name=<config>`) |
+| `scripts/test.py` | Evaluation |
+| `scripts/train_and_test.py` | Combined train + test |
+| `scripts/router.py` | Forward routing with a trained model |
+| `scripts/summed_q_prime.py` | Baseline — unrouted sum of lateral inflows |
+
+---
+
+## Downstream Call-Site Checklist
+
+When modifying `src/ddr/` interfaces (constructor signatures, `forward()`
+return types, config fields), **always check and update these downstream
+consumers**:
+
+1. **`examples/`** — Example notebooks that instantiate `kan()`, `dmc()`, and
+   load configs.
+2. **`benchmarks/scripts/benchmark.py`** and
+   **`benchmarks/src/ddr_benchmarks/benchmark.py`** — Own `kan()`/`dmc()`
+   instantiation and evaluation loops that must stay in sync with the core
+   scripts.
+3. **`scripts/`** — All training/testing scripts.
+4. **`config/`** — YAML files may reference field names that changed.
+
+Quick grep to find all `kan()` constructor call sites:
+
+```bash
+grep -r "kan(" examples/ benchmarks/ scripts/
+```
+
+---
+
+## Testing
+
+```bash
+uv run pytest                    # Unit tests (no data dependencies)
+uv run pytest -m integration     # Integration tests (requires HPC data)
+```
+
+- Unit tests live in `tests/`.
+- Integration tests are marked with `@pytest.mark.integration` and deselected
+  by default (`addopts = "-m 'not integration'"`).
+
+---
+
+## Code Quality
+
+| Tool | Config |
+|---|---|
+| **Linter** | ruff — rules: F, E, W, I, D, B, Q, TID, C4, BLE, UP, RUF100 |
+| **Formatter** | ruff format |
+| **Type checker** | mypy (strict: `disallow_untyped_defs = true`) |
+| **Docstrings** | NumPy convention (`tool.ruff.lint.pydocstyle`) |
+| **Line length** | 110 |
+| **Pre-commit** | ruff check+format, mypy, nbstripout, trailing-whitespace, end-of-file-fixer, check-yaml |
+
+All config lives in `pyproject.toml`. Pre-commit hooks are defined in
+`.pre-commit-config.yaml`.
+
+---
+
+## Datasets (GeoDataset enum)
+
+Two supported geodatasets (see `src/ddr/validation/enums.py`):
+
+| Enum value | Dataset | Attributes | Geometry |
+|---|---|---|---|
+| `merit` | MERIT-Hydro global river network | `.nc` | `.shp` |
+| `lynker_hydrofabric` | Lynker Hydrofabric v2.2 (CONUS) | icechunk store | `.gpkg` |
+
+Key difference: MERIT uses `log10_uparea`; Lynker uses `log_uparea` for
+upstream area.
+
+---
+
+## Workspace (monorepo)
+
+Three packages managed by `uv`:
+
+| Package | Directory | Description |
+|---|---|---|
+| `ddr` | `.` (root) | Core routing library |
+| `ddr-engine` | `engine/` | Geospatial data preparation |
+| `ddr-benchmarks` | `benchmarks/` | Comparison framework (vs DiffRoute, etc.) |
+
+Install everything:
+
+```bash
+uv sync --all-packages
+```
+
+---
+
+## Key Conventions
+
+- Python **>=3.11, <3.14**.
+- PyTorch with CUDA 13.0 index by default (configurable in `pyproject.toml`).
+- Sparse CSR tensors are used for the routing matrix solve — expect beta
+  warnings from PyTorch (already suppressed in pytest config).
+- `__init__.py` files use `F401` ignore so re-exports don't trigger unused-import
+  lint errors.
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -0,0 +1,66 @@
+# Contributing to DDR
+
+## Development Setup
+
+```bash
+git clone https://github.com/DeepGroundwater/ddr.git
+cd ddr
+uv sync --all-packages
+pre-commit install
+```
+
+## Code Style
+
+- **Formatter/Linter:** ruff (line length 110)
+- **Type checking:** mypy (strict mode)
+- **Docstrings:** NumPy convention
+- Pre-commit hooks enforce all of the above on every commit.
+
+## Running Tests
+
+```bash
+uv run pytest                     # Unit tests
+uv run pytest -m integration      # Integration tests (requires local data)
+uv run pytest tests/routing/ -v   # Run specific test directory
+```
+
+## Interface Change Checklist
+
+When modifying `src/ddr/` interfaces (constructor signatures, forward() return types, config fields), check these downstream consumers:
+
+1. **`scripts/`** — `train.py`, `test.py`, `train_and_test.py`, `router.py` instantiate `kan()` and `dmc()`
+2. **`examples/`** — Notebooks load configs and instantiate models
+3. **`benchmarks/`** — `benchmarks/scripts/benchmark.py` and `benchmarks/src/ddr_benchmarks/benchmark.py` have their own model instantiation
+4. **`config/`** — YAML files may reference renamed or removed fields
+
+Quick check: `grep -r "kan(" examples/ benchmarks/ scripts/` to find all call sites.
+
+## Pull Request Process
+
+1. Create a feature branch from `master`
+2. Make your changes with tests
+3. Ensure CI passes: `uv run pytest && uv run ruff check . && uv run mypy src/`
+4. Open a PR with a clear description of what and why
+
+## Adding a New GeoDataset
+
+1. Add enum value to `src/ddr/validation/enums.py` (`GeoDataset`)
+2. Create dataset class in `src/ddr/geodatazoo/` extending `BaseGeoDataset`
+3. Register in `GeoDataset.get_dataset_class()`
+4. Add config template in `config/templates/`
+5. Add tests in `tests/geodatazoo/`
+
+## Project Structure
+
+```
+ddr/
+├── src/ddr/          # Core library
+├── engine/           # Data preparation (ddr-engine package)
+├── benchmarks/       # Comparison framework (ddr-benchmarks package)
+├── scripts/          # Training/testing entry points
+├── config/           # Hydra YAML configs
+│   └── templates/    # Portable config templates (version controlled)
+├── examples/         # Example notebooks + trained weights
+├── tests/            # Test suite
+└── docs/             # Zensical documentation
+```
diff --git a/README.md b/README.md
@@ -8,9 +8,9 @@ An implementation of differentiable river routing methods for the NextGen Framew
 
 [![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
 
-### Dependencies
+### Installation
 
-The following commands will allow you to install all required dependencies for DDR
+DDR uses [uv](https://docs.astral.sh/uv/) for dependency management. From the repo root:
 
 ```sh
 # Full workspace — CPU (includes ddr, ddr-engine, and ddr-benchmarks)
@@ -26,39 +26,60 @@ uv sync --all-packages --group cuda12
 uv sync --all-packages --group cuda13
 ```
 
+This installs the `ddr` CLI entry point. Verify with `ddr --help`.
+
+### Quick Start
+
+1. **Copy a config template** from `config/templates/` and customize paths:
+   ```sh
+   cp config/templates/merit_training.yaml config/my_training.yaml
+   # Edit config/my_training.yaml — set data_sources paths to your data
+   ```
+
+   Templates use `${oc.env:DDR_DATA_DIR,./data}` so you can set the
+   `DDR_DATA_DIR` environment variable or edit the paths directly.
+
+2. **Train a model** using the `ddr` CLI:
+   ```sh
+   ddr train --config-name=my_training
+   ```
+
+3. **Evaluate** the trained model:
+   ```sh
+   ddr test --config-name=my_testing
+   ```
+
+Available subcommands: `train`, `test`, `route`, `train-and-test`, `summed-q-prime`.
+
+You can also call scripts directly if preferred:
+```sh
+uv run python scripts/train.py --config-name=my_training
+```
+
 ### Data Engine
 
-Next, you need to create the necessary data files for running a routing across your domain.
-- The example below is for the NOAA-OWP Hydrofabric v2.2 (Dataset is not included in the repo)
-- This requires the `ddr-engine` local package to be installed (which is done automatically through the above `uv sync`)
+Before training, you need to create adjacency matrices for your domain:
+- Requires the `ddr-engine` local package (installed automatically by `uv sync --all-packages`)
 - The gauges.csv can be found [here](https://github.com/DeepGroundwater/references/tree/master/mhpi/dHBV2.0UH)
 
 ```sh
 uv run python engine/scripts/build_hydrofabric_v2.2_matrices.py <PATH/TO/conus_nextgen.gpkg> data/ --gages references/mhpi/dHBV2.0UH/training_gauges.csv
 ```
 
-This will create two files used for routing
-- `hydrofabric_v2.2_conus_adjacency.zarr`
-  - a sparse COO matrix containing the whole river network for Hydrofabric v2.2 across CONUS
-- `hydrofabric_v2.2_gages_conus_adjacency.zarr`
-  - a zarr.Group of sparse coo matrices for river networks upstream of USGS Gauges
+This creates two files used for routing:
+- `hydrofabric_v2.2_conus_adjacency.zarr` — sparse COO matrix of the full CONUS river network
+- `hydrofabric_v2.2_gages_conus_adjacency.zarr` — zarr.Group of sparse COO matrices for networks upstream of USGS gauges
 
-### Model Train
+### Pre-trained Examples
 
-All that's left is to train a routing model
-```sh
-# Train a model using the MHPI S3 defaults
-python scripts/train.py --config-name example_config.yaml
+The `examples/` directory contains pre-trained weights and notebooks for both
+MERIT and Lynker Hydrofabric datasets. See [`examples/README.md`](examples/README.md).
 
-#Test the model
-python scripts/test.py --config-name example_test_config.yaml
-```
+### Documentation
 
-### How to build docs locally
-The zensical documentation can be built/verified locally through installing the optional `docs` dependencies and serving through localhost:
+Build and serve the docs locally:
 
 ```sh
-uv pip install -e ".[docs]"
 uv run zensical serve
 ```