Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 0 additions & 32 deletions docs/benchmarks/diffroute.md
Original file line number Diff line number Diff line change
Expand Up @@ -124,38 +124,6 @@ uv run python scripts/benchmark.py diffroute.enabled=false

The benchmark produces:

### Metrics (logged)

```
=== DDR Metrics ===
----------------------------------------
Metric | Mean | Median
----------------------------------------
NSE | 0.7234 | 0.7891
RMSE | 12.3456 | 8.7654
KGE | 0.6543 | 0.7012
----------------------------------------

=== DiffRoute Metrics ===
----------------------------------------
Metric | Mean | Median
----------------------------------------
NSE | 0.6891 | 0.7456
...

=== Summed Q' Metrics ===
...
```

### Mass Balance (logged)

```
=== Mass Balance Accumulation Comparison ===
DDR vs Obs — Mean rel. error: 0.1234, Median: 0.0567
DiffRoute vs Obs — Mean rel. error: 0.2345, Median: 0.1234
DDR vs summed Q' — Mean rel. error: 0.0456, Median: 0.0234
```

### Plots (saved to `output/<run>/plots/`)

| File | Description |
Expand Down
18 changes: 3 additions & 15 deletions docs/benchmarks/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ The `ddr-benchmarks` package provides tools for comparing DDR against other rout
Benchmarking routing models requires:

1. **Identical input data** - Same lateral inflows (Q'), network topology, and time period
2. **Consistent evaluation** - Same metrics (NSE, KGE, RMSE) computed on same observations
2. **Consistent evaluation** - Same evaluation criteria applied to the same observations
3. **Fair comparison** - Account for differences in model formulations and parameters

The benchmarks package addresses all three by reusing DDR's existing data infrastructure while providing adapters for other routing models.
Expand Down Expand Up @@ -88,17 +88,6 @@ The benchmark produces publication-quality plots and console diagnostics:
| `gauge_map_sqp_NSE.png` | Map of gauges colored by summed Q' NSE (if enabled) |
| `hydrographs/*.png` | Per-gage time series with all models overlaid |

### Console Output

Mass balance accumulation comparison is logged for each model:

```
=== Mass Balance Accumulation Comparison ===
DDR vs Obs — Mean rel. error: 0.1234, Median: 0.0567
DiffRoute vs Obs — Mean rel. error: 0.2345, Median: 0.1234
DDR vs summed Q' — Mean rel. error: 0.0456, Median: 0.0234
```

### Results (saved to `output/<run>/benchmark_results.zarr`)

```python
Expand Down Expand Up @@ -146,9 +135,8 @@ The main benchmark script follows the same pattern as `scripts/test.py`:
3. **Phase 1**: Run DDR on time-batched DataLoader, accumulate predictions
4. **Phase 2**: Run DiffRoute per-gage using zarr subgroup graphs
5. Optionally load summed Q' predictions for baseline comparison
6. Compute metrics using DDR's `Metrics` class
7. Log mass balance accumulation comparison
8. Generate comparison plots (CDF, boxplots, gauge maps, hydrographs)
6. Evaluate predictions using DDR's `Metrics` class
7. Generate comparison plots (CDF, boxplots, gauge maps, hydrographs)
9. Save results to zarr

### DiffRoute Adapter (`diffroute_adapter.py`)
Expand Down
8 changes: 1 addition & 7 deletions docs/startup.md
Original file line number Diff line number Diff line change
Expand Up @@ -240,13 +240,7 @@ __NOTE:__ Please change the config to match what mode/geodataset/method you need

### Monitoring

DDR logs progress including:

- Loss values per epoch and mini-batch
- NSE, RMSE, and KGE metrics
- Parameter statistics

Model checkpoints are saved to the `params.save_path` directory.
Training progress is logged to the output directory. Model checkpoints are saved to the `params.save_path` directory.

### Expected Model outputs

Expand Down
2 changes: 1 addition & 1 deletion docs/usage/examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,4 +57,4 @@ Each `example_config.yaml` uses `${oc.env:DDR_DATA_DIR,./../../data}` so paths r

## Model Evaluation

The `examples/eval/evaluate.ipynb` notebook demonstrates how to evaluate the performance of a trained model and compare routed predictions against the summed Q' baseline.
The `examples/eval/evaluate.ipynb` notebook demonstrates how to compare routed predictions against observations and the summed Q' baseline.
2 changes: 1 addition & 1 deletion docs/usage/routing.md
Original file line number Diff line number Diff line change
Expand Up @@ -159,4 +159,4 @@ data_sources:
## Next Steps

- [Benchmarks](../benchmarks/index.md): Compare routing results against other models
- [Model Testing](test.md): Evaluate model performance with observations
- [Model Testing](test.md): Compare predictions against observations
20 changes: 1 addition & 19 deletions docs/usage/summed_q_prime.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,7 @@ The summed lateral flow (Summed Q') baseline computes streamflow at gauge locati

Routing redistributes flow in time — it delays and attenuates flood waves as they travel downstream. The Summed Q' baseline skips this step entirely, giving you a direct measure of how much your unit catchment predictions (from dHBV, NWM, or any lumped model) contribute to the total signal vs. how much routing improves it.

Comparing DDR against Summed Q' tells you:

- **How well your lateral inflows capture total volume** (bias, FLV)
- **How much timing improvement routing adds** (NSE, KGE, correlation)
- **Whether routing is worth the compute cost** for your application
Comparing DDR against Summed Q' quantifies the effect of routing on the predicted hydrograph relative to a simple summation baseline.

## Quick Start

Expand Down Expand Up @@ -68,20 +64,6 @@ output/<run_name>/
└── detailed_metrics_<timestamp>.csv # Per-gauge metrics
```

### Metrics

The script reports the following metrics for all valid gauges:

| Metric | Description | Ideal |
|--------|-------------|-------|
| **NSE** | Nash-Sutcliffe Efficiency | 1.0 |
| **KGE** | Kling-Gupta Efficiency | 1.0 |
| **Bias** | Mean bias ratio | 1.0 |
| **FLV** | Low flow volume error (%) | 0.0 |
| **FHV** | High flow volume error (%) | 0.0 |
| **MAE** | Mean Absolute Error | 0.0 |
| **RMSE** | Root Mean Square Error | 0.0 |

### Loading Results

```python
Expand Down
38 changes: 8 additions & 30 deletions docs/usage/test.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Model testing evaluates a trained DDR model on a different time period than trai

1. Load trained model checkpoint
2. Run forward pass on test period data
3. Compute metrics (NSE, KGE, RMSE) against observations
3. Compare predictions against observations
4. Generate evaluation outputs

## Quick Start
Expand All @@ -33,7 +33,7 @@ experiment:
batch_size: 64
start_time: 1995/10/01 # Test period start
end_time: 2010/09/30 # Test period end
warmup: 3 # Warmup days excluded from metrics
warmup: 3 # Days excluded from evaluation during spin-up
checkpoint: /path/to/trained_model.pt # Required!
```

Expand Down Expand Up @@ -64,23 +64,18 @@ with torch.no_grad():
predictions[:, indices] = dmc_output["runoff"].cpu().numpy()
```

### 3. Compute Metrics
### 3. Evaluate Predictions

DDR computes standard hydrologic metrics:

| Metric | Description | Ideal Value |
|--------|-------------|-------------|
| **NSE** | Nash-Sutcliffe Efficiency | 1.0 |
| **KGE** | Kling-Gupta Efficiency | 1.0 |
| **RMSE** | Root Mean Square Error | 0.0 |
Use the `Metrics` class from `ddr.validation` to compare predictions against observations:

```python
from ddr.validation.metrics import Metrics

metrics = Metrics(pred=daily_runoff[:, warmup:], target=observations[:, warmup:])
print(f"NSE: {metrics.nse.mean():.4f}")
print(f"KGE: {metrics.kge.mean():.4f}")
print(f"RMSE: {metrics.rmse.mean():.4f}")
```

See the `Metrics` class for available evaluation attributes.

## Output

Test results are saved to:
Expand All @@ -106,23 +101,6 @@ print(ds)
# observations (gage_ids, time) float64
```

## Interpreting Results

### NSE Guidelines

| NSE Range | Interpretation |
|-----------|----------------|
| > 0.75 | Very good |
| 0.65 - 0.75 | Good |
| 0.50 - 0.65 | Satisfactory |
| < 0.50 | Unsatisfactory |

### Common Issues

1. **Poor performance on large basins**: May need more training data or different architecture
2. **Negative NSE**: Model predictions worse than mean - check data alignment
3. **Good NSE but poor KGE**: Timing/bias issues - inspect hydrographs

## Next Steps

- [Routing](routing.md): Run inference on new domains
Expand Down
7 changes: 1 addition & 6 deletions docs/usage/train.md
Original file line number Diff line number Diff line change
Expand Up @@ -136,12 +136,7 @@ The training will resume from the saved epoch and mini-batch.

## Monitoring

Training logs include:

- Loss values per mini-batch
- NSE, RMSE, KGE metrics periodically
- Learning rate changes
- Parameter statistics
Training progress is logged to the output directory. See the log file for details on loss values, learning rate changes, and parameter statistics.

## Tips

Expand Down
2 changes: 2 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ maintainers = [
]

dependencies = [
"bmipy>=2.0",
"botocore>=1.42.5",
"colormaps",
"cubed",
Expand Down Expand Up @@ -170,6 +171,7 @@ convention = "numpy"
"mkdocs.yml" = ["I"]
"tests/*" = ["D"]
"*/__init__.py" = ["F401"]
"src/ddr/bmi/ddr_bmi.py" = ["D102"] # BMI interface methods are self-documenting (bmipy.Bmi)
"*.ipynb" = ["E402", "E501", "F401", "F811", "F841", "T201"] # Common notebook exceptions

[tool.ruff.format]
Expand Down
7 changes: 6 additions & 1 deletion src/ddr/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,9 @@
from .nn import kan
from .routing.torch_mc import dmc

__all__ = ["__version__", "dmc", "streamflow", "ddr_functions", "kan", "validation"]
try:
from . import bmi
except ImportError:
bmi = None # type: ignore[assignment]

__all__ = ["__version__", "dmc", "streamflow", "ddr_functions", "kan", "validation", "bmi"]
9 changes: 9 additions & 0 deletions src/ddr/bmi/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
"""BMI wrapper for DDR differentiable Muskingum-Cunge routing.

Provides a BMI v2.0 (CSDMS) interface for integration with the NGWPC/ngen
NextGen Water Resources Modeling Framework as a drop-in replacement for t-route.
"""

from .ddr_bmi import DdrBmi

__all__ = ["DdrBmi"]
50 changes: 50 additions & 0 deletions src/ddr/bmi/config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
"""BMI initialization config schema.

Defines the YAML config format for DDR's BMI wrapper. This config points to
the full DDR Hydra config and trained KAN checkpoint, keeping BMI-specific
settings separate from DDR's internal configuration.
"""

from pathlib import Path
from typing import Literal

from pydantic import BaseModel, Field


class BmiInitConfig(BaseModel):
"""Schema for the BMI initialization YAML config file.

Parameters
----------
ddr_config : Path
Path to DDR's Hydra YAML config file.
kan_checkpoint : Path
Path to trained KAN .pt checkpoint file.
hydrofabric_gpkg : Path or None
Override hydrofabric GeoPackage path from ddr_config.
conus_adjacency : Path or None
Override adjacency matrix path from ddr_config.
device : str
Compute device ("cpu", "cuda", "cuda:0", etc.).
timestep_seconds : float
Internal MC routing timestep in seconds. Can be smaller than
ngen's coupling interval for sub-stepping (e.g., 900s routing
with 3600s ngen_dt gives 4 sub-steps per coupling).
interpolation : {"constant", "linear"}
Lateral inflow interpolation between ngen coupling intervals
when sub-stepping. "constant" holds inflows fixed (zeroth-order);
"linear" interpolates from previous to current inflows across
sub-steps. See ``data/diagrams/bmi_testing_guide.txt`` for
mass conservation implications.
"""

ddr_config: Path = Field(description="Path to DDR Hydra YAML config")
kan_checkpoint: Path = Field(description="Path to trained KAN checkpoint")
hydrofabric_gpkg: Path | None = Field(default=None, description="Override hydrofabric GeoPackage path")
conus_adjacency: Path | None = Field(default=None, description="Override adjacency matrix path")
device: str = Field(default="cpu", description="Compute device")
timestep_seconds: float = Field(default=3600.0, description="Internal MC routing timestep in seconds")
interpolation: Literal["constant", "linear"] = Field(
default="constant",
description="Lateral inflow interpolation for sub-stepping: 'constant' or 'linear'",
)
Loading
Loading