Skip to content

Read raster composites in spatial blocks to bound materialize memory#674

Open
rpw1134 wants to merge 1 commit into
masterfrom
ryan/chunked-materialize-read
Open

Read raster composites in spatial blocks to bound materialize memory#674
rpw1134 wants to merge 1 commit into
masterfrom
ryan/chunked-materialize-read

Conversation

@rpw1134

@rpw1134 rpw1134 commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Motivation

read_raster_window_from_tiles reads each item's whole intersection in one tile_store.read_raster call, allocating a full-scene source array on top of the full-scene dst accumulator. On large scenes (~25000×20000 Sentinel-1, 2 bands f32) that transient dominates peak memory and OOM-kills materialize. This reads/composites the intersection in spatial blocks so the source transient is bounded to ~one block.

Change

  • New TileStore.read_raster_blocked generator. Base default yields the whole bounds as a single block — non-overriding (e.g. remote/API-backed) stores keep their exact one-read behavior. DefaultTileStore overrides it to open the COG + WarpedVRT once and read each block as a window of that shared VRT (GeotiffRasterFormat.decode_raster_blocked), avoiding per-block reopen.
  • read_raster_window_from_tiles now loops over read_raster_blocked; composite math unchanged.

Correctness

Bit-identical to the single-read path (the warp is per-output-pixel, independent of read window). Covered by new parametrized multi-block regression tests (nearest/bilinear/cubic, T>1, nonzero offsets, first-valid nodata) and verified end-to-end on a Sentinel-1 vessel scene (114/114 detections unchanged).

Measured (Sentinel-1, ~25k×20k, 2 historicals)

Materialize anon peak ~26.7 → ~14–16 GiB. Blocked reads add ~one extra knob: the shared full-scene WarpedVRT peaks ~15.96 GiB and is ~12% faster on materialize; per-block VRTs peak ~14.46 GiB and are slower. Both are well under the prior single-read peak.

🤖 Generated with Claude Code

read_raster_window_from_tiles read each item's whole intersection in one
tile_store.read_raster call, allocating a full-scene source array on top of
the full-scene dst accumulator. On large scenes (e.g. ~25000x20000 Sentinel-1,
2 bands f32) that transient dominates peak memory and OOM-kills materialize.

Read and composite the intersection in spatial blocks instead, so the source
transient is bounded to one block. New tile-store method read_raster_blocked
yields the window block-by-block; DefaultTileStore opens the COG + WarpedVRT
once and reads each block as a window of that shared VRT
(GeotiffRasterFormat.decode_raster_blocked), avoiding per-block reopen cost.
The base-class default yields the whole bounds as a single block, so non-
overriding (e.g. remote/API-backed) stores keep their original one-read
behavior.

Output is bit-identical to the single-read path (the warp is computed per
output pixel independent of the read window): verified by parametrized
multi-block regression tests over nearest/bilinear/cubic resampling, T>1,
nonzero intersection offsets, and first-valid nodata, plus end-to-end on a
Sentinel-1 vessel-detection scene (114/114 detections unchanged).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@rpw1134 rpw1134 marked this pull request as ready for review June 23, 2026 19:57
@rpw1134 rpw1134 requested a review from favyen2 June 23, 2026 19:58
@favyen2

favyen2 commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

This approach looks reasonable to me -- only add blocked reading to GeotiffRasterFormat since it can be used directly in DefaultTileStore, and the default implementation in TileStore reuses read_raster (one read). I think that design is good.

I still like the alternative of having the downstream user create smaller windows though. Since it feels like we have converged on the window being the unit of parallelization/processing. I think the recall for Sentinel-1 vessel detection was degraded but if that's an issue then it seems like overlapping windows would solve it, although I thought that would be unnecessarily complex; 5% drop is higher than what I expected though.

Also one limitation is it's really only reducing memory usage by up to 1/2 ish because dst is still the size of the window.

I'm curious what others think, @APatrickJ do you have any thoughts about it? Otherwise I think it's fine as long as it is validated that there isn't a slowdown for windows smaller than the block size, e.g. when materializing thousands of 64x64 windows.

# warp buffers, which dominate peak memory on large scenes; reading in blocks bounds
# the transient warp allocation to roughly one block. 2048 keeps that transient small;
# larger blocks raise the per-block warp buffer with no offsetting benefit.
_READ_BLOCK_SIZE = 2048

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be duplicated from raster_format.DEFAULT_READ_BLOCK_SIZE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants