Read raster composites in spatial blocks to bound materialize memory#674
Read raster composites in spatial blocks to bound materialize memory#674rpw1134 wants to merge 1 commit into
Conversation
read_raster_window_from_tiles read each item's whole intersection in one tile_store.read_raster call, allocating a full-scene source array on top of the full-scene dst accumulator. On large scenes (e.g. ~25000x20000 Sentinel-1, 2 bands f32) that transient dominates peak memory and OOM-kills materialize. Read and composite the intersection in spatial blocks instead, so the source transient is bounded to one block. New tile-store method read_raster_blocked yields the window block-by-block; DefaultTileStore opens the COG + WarpedVRT once and reads each block as a window of that shared VRT (GeotiffRasterFormat.decode_raster_blocked), avoiding per-block reopen cost. The base-class default yields the whole bounds as a single block, so non- overriding (e.g. remote/API-backed) stores keep their original one-read behavior. Output is bit-identical to the single-read path (the warp is computed per output pixel independent of the read window): verified by parametrized multi-block regression tests over nearest/bilinear/cubic resampling, T>1, nonzero intersection offsets, and first-valid nodata, plus end-to-end on a Sentinel-1 vessel-detection scene (114/114 detections unchanged). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
This approach looks reasonable to me -- only add blocked reading to I still like the alternative of having the downstream user create smaller windows though. Since it feels like we have converged on the window being the unit of parallelization/processing. I think the recall for Sentinel-1 vessel detection was degraded but if that's an issue then it seems like overlapping windows would solve it, although I thought that would be unnecessarily complex; 5% drop is higher than what I expected though. Also one limitation is it's really only reducing memory usage by up to 1/2 ish because I'm curious what others think, @APatrickJ do you have any thoughts about it? Otherwise I think it's fine as long as it is validated that there isn't a slowdown for windows smaller than the block size, e.g. when materializing thousands of 64x64 windows. |
| # warp buffers, which dominate peak memory on large scenes; reading in blocks bounds | ||
| # the transient warp allocation to roughly one block. 2048 keeps that transient small; | ||
| # larger blocks raise the per-block warp buffer with no offsetting benefit. | ||
| _READ_BLOCK_SIZE = 2048 |
There was a problem hiding this comment.
This seems to be duplicated from raster_format.DEFAULT_READ_BLOCK_SIZE
Motivation
read_raster_window_from_tilesreads each item's whole intersection in onetile_store.read_rastercall, allocating a full-scene source array on top of the full-scenedstaccumulator. On large scenes (~25000×20000 Sentinel-1, 2 bands f32) that transient dominates peak memory and OOM-kills materialize. This reads/composites the intersection in spatial blocks so the source transient is bounded to ~one block.Change
TileStore.read_raster_blockedgenerator. Base default yields the whole bounds as a single block — non-overriding (e.g. remote/API-backed) stores keep their exact one-read behavior.DefaultTileStoreoverrides it to open the COG +WarpedVRTonce and read each block as a window of that shared VRT (GeotiffRasterFormat.decode_raster_blocked), avoiding per-block reopen.read_raster_window_from_tilesnow loops overread_raster_blocked; composite math unchanged.Correctness
Bit-identical to the single-read path (the warp is per-output-pixel, independent of read window). Covered by new parametrized multi-block regression tests (nearest/bilinear/cubic, T>1, nonzero offsets, first-valid nodata) and verified end-to-end on a Sentinel-1 vessel scene (114/114 detections unchanged).
Measured (Sentinel-1, ~25k×20k, 2 historicals)
Materialize anon peak ~26.7 → ~14–16 GiB. Blocked reads add ~one extra knob: the shared full-scene
WarpedVRTpeaks ~15.96 GiB and is ~12% faster on materialize; per-block VRTs peak ~14.46 GiB and are slower. Both are well under the prior single-read peak.🤖 Generated with Claude Code