Read raster composites in spatial blocks to bound materialize memory by rpw1134 · Pull Request #674 · allenai/rslearn

rpw1134 · 2026-06-23T19:56:35Z

Motivation

read_raster_window_from_tiles reads each item's whole intersection in one tile_store.read_raster call, allocating a full-scene source array on top of the full-scene dst accumulator. On large scenes (~25000×20000 Sentinel-1, 2 bands f32) that transient dominates peak memory and OOM-kills materialize. This reads/composites the intersection in spatial blocks so the source transient is bounded to ~one block.

Change

New TileStore.read_raster_blocked generator. Base default yields the whole bounds as a single block — non-overriding (e.g. remote/API-backed) stores keep their exact one-read behavior. DefaultTileStore overrides it to open the COG + WarpedVRT once and read each block as a window of that shared VRT (GeotiffRasterFormat.decode_raster_blocked), avoiding per-block reopen.
read_raster_window_from_tiles now loops over read_raster_blocked; composite math unchanged.

Correctness

Bit-identical to the single-read path (the warp is per-output-pixel, independent of read window). Covered by new parametrized multi-block regression tests (nearest/bilinear/cubic, T>1, nonzero offsets, first-valid nodata) and verified end-to-end on a Sentinel-1 vessel scene (114/114 detections unchanged).

Measured (Sentinel-1, ~25k×20k, 2 historicals)

Materialize anon peak ~26.7 → ~14–16 GiB. Blocked reads add ~one extra knob: the shared full-scene WarpedVRT peaks ~15.96 GiB and is ~12% faster on materialize; per-block VRTs peak ~14.46 GiB and are slower. Both are well under the prior single-read peak.

🤖 Generated with Claude Code

read_raster_window_from_tiles read each item's whole intersection in one tile_store.read_raster call, allocating a full-scene source array on top of the full-scene dst accumulator. On large scenes (e.g. ~25000x20000 Sentinel-1, 2 bands f32) that transient dominates peak memory and OOM-kills materialize. Read and composite the intersection in spatial blocks instead, so the source transient is bounded to one block. New tile-store method read_raster_blocked yields the window block-by-block; DefaultTileStore opens the COG + WarpedVRT once and reads each block as a window of that shared VRT (GeotiffRasterFormat.decode_raster_blocked), avoiding per-block reopen cost. The base-class default yields the whole bounds as a single block, so non- overriding (e.g. remote/API-backed) stores keep their original one-read behavior. Output is bit-identical to the single-read path (the warp is computed per output pixel independent of the read window): verified by parametrized multi-block regression tests over nearest/bilinear/cubic resampling, T>1, nonzero intersection offsets, and first-valid nodata, plus end-to-end on a Sentinel-1 vessel-detection scene (114/114 detections unchanged). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

favyen2 · 2026-06-23T20:15:06Z

This approach looks reasonable to me -- only add blocked reading to GeotiffRasterFormat since it can be used directly in DefaultTileStore, and the default implementation in TileStore reuses read_raster (one read). I think that design is good.

I still like the alternative of having the downstream user create smaller windows though. Since it feels like we have converged on the window being the unit of parallelization/processing. I think the recall for Sentinel-1 vessel detection was degraded but if that's an issue then it seems like overlapping windows would solve it, although I thought that would be unnecessarily complex; 5% drop is higher than what I expected though.

Also one limitation is it's really only reducing memory usage by up to 1/2 ish because dst is still the size of the window.

I'm curious what others think, @APatrickJ do you have any thoughts about it? Otherwise I think it's fine as long as it is validated that there isn't a slowdown for windows smaller than the block size, e.g. when materializing thousands of 64x64 windows.

APatrickJ · 2026-06-25T16:01:15Z

+# warp buffers, which dominate peak memory on large scenes; reading in blocks bounds
+# the transient warp allocation to roughly one block. 2048 keeps that transient small;
+# larger blocks raise the per-block warp buffer with no offsetting benefit.
+_READ_BLOCK_SIZE = 2048


This seems to be duplicated from raster_format.DEFAULT_READ_BLOCK_SIZE

rpw1134 marked this pull request as ready for review June 23, 2026 19:57

rpw1134 requested a review from favyen2 June 23, 2026 19:58

APatrickJ reviewed Jun 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Read raster composites in spatial blocks to bound materialize memory#674

Read raster composites in spatial blocks to bound materialize memory#674
rpw1134 wants to merge 1 commit into
masterfrom
ryan/chunked-materialize-read

rpw1134 commented Jun 23, 2026

Uh oh!

favyen2 commented Jun 23, 2026 •

edited

Loading

Uh oh!

APatrickJ Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

rpw1134 commented Jun 23, 2026

Motivation

Change

Correctness

Measured (Sentinel-1, ~25k×20k, 2 historicals)

Uh oh!

favyen2 commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

APatrickJ Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

favyen2 commented Jun 23, 2026 •

edited

Loading