Skip to content

[New feature] Add GeoParquet support for Sentinel2 from PC#669

Draft
robmarkcole wants to merge 3 commits into
masterfrom
use-pc-geoparquet
Draft

[New feature] Add GeoParquet support for Sentinel2 from PC#669
robmarkcole wants to merge 3 commits into
masterfrom
use-pc-geoparquet

Conversation

@robmarkcole

@robmarkcole robmarkcole commented Jun 15, 2026

Copy link
Copy Markdown
Collaborator

Adds a GeoParquet metadata backend to the Planetary Computer data sources. Previously, scene discovery during dataset prepare always used the Planetary Computer STAC API, making one search request per window, which can hit API rate limits on large jobs.

Retrying after catching error in retry loop: 503 Server Error: Operations per second is over the account limit.

The new backend queries the collection-level GeoParquet item table (hosted on Azure Blob Storage) using DuckDB. It downloads only the relevant date-range partitions, runs a spatial and temporal filter locally, and returns matching items in a single bulk operation per get_itemscall. Because rslearn already batches all windows into one get_items call during prepare, this effectively replaces N per-window STAC requests with one bulk query.

The feature is opt-in via a new metadata_backend parameter (default "stac", new option "geoparquet") on the PlanetaryComputer base class and all its subclasses including Sentinel2. An optional metadata_cache_dir can be set to cache the partition file list and query results between runs. DuckDB is required and is included in the existing extra optional dependencies.

  • Implemented functions to create mock GeoParquet rows for Sentinel2.
  • Added tests for batch preparation of window metadata and item retrieval by name.
  • Updated uv.lock to include duckdb version 1.5.3 and incremented revision number.
  • Added duckdb as an extra dependency for the project.

Note: I need to apply this to a large-scale dataset to see if there is a significant speedup

- Implemented functions to create mock GeoParquet rows for Sentinel2.
- Added tests for batch preparation of window metadata and item retrieval by name.
- Updated `uv.lock` to include duckdb version 1.5.3 and incremented revision number.
- Added duckdb as an extra dependency for the project.
@robmarkcole robmarkcole marked this pull request as draft June 15, 2026 12:51
@robmarkcole robmarkcole changed the title Add GeoParquet support for Sentinel2 data source and update dependencies [New feature] Add GeoParquet support for Sentinel2 from PC Jun 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant