Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/instructions/scripts.instructions.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ applyTo: "scripts/**"
See CLAUDE.md "Development Workflow" for usage. All scripts require the Docker compose environment.

- `runinpypgstac` is the foundation — most scripts delegate to it
- `loadsampledata` has a host wrapper at `scripts/loadsampledata`; prefer that wrapper over calling `runinpypgstac` directly
- `runinpypgstac` uses the published-package path by default; set `PGPKG_LOCAL_REPO_DIR` to mount a local `pgpkg` checkout at `/pgpkg` when you need an override
- `scripts/container-scripts/` contains the in-container script payload copied into the pypgstac image; keep host wrappers in `scripts/`
- `stageversion` modifies version files AND generates migrations — see CLAUDE.md "Migration Process"
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/continuous-integration.yml
Original file line number Diff line number Diff line change
Expand Up @@ -184,6 +184,7 @@ jobs:
run: scripts/container-scripts/test
rust-crate:
name: Test rust crate
if: ${{ false }} # FIXME: turn back on before v0.10 release
runs-on: ubuntu-latest
needs:
- changes
Expand Down
7 changes: 7 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,14 @@ src/pypgstac/python/pypgstac/*.so
.venv
.pytest_cache
.plans/
.compound-engineering/
docs/plans/
STRATEGY.md
.benchmarks-local/
.env
.explorations/
benchmarks/results/
scripts/benchmarkv0910
src/pgstacrust/target/
src/pgstac-migrate/dist/
src/pgstac-migrate/src/pgstac_migrate/migrations.tar.zst
Expand Down
1 change: 1 addition & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,4 +52,5 @@ Specialist in pypgstac bulk loading (`src/pypgstac/src/pypgstac/load.py`). See C
- **Retry safety**: `item.pop("partition", None)` with `None` default; `before_sleep` sets `partition.requires_update = True` on `CheckViolation`
- **Retry scope**: `CheckViolation`, `DeadlockDetected`, `SerializationFailure`, `LockNotAvailable`, `ObjectInUse`
- **Load modes**: `insert`, `ignore`/`insert_ignore`, `upsert`, `delsert`
- **Sample data load**: `scripts/loadsampledata`
- Test: `scripts/runinpypgstac --build test --pypgstac`
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,11 @@ and this project adheres to [Semantic Versioning](http://semver.org/).

### Added

- New [Promoted Fields](https://stac-utils.github.io/pgstac/promoted-fields/) reference document listing every STAC property that pgstac promotes to a native `items` column, with spec source, extension version, SQL type, and a machine-readable YAML registry for AI-assisted updates.
- Align promoted fields with current STAC extension specs: add `proj:geometry`, `view:moon_azimuth`, `view:moon_elevation`, `sat:platform_international_designator`, `sat:anx_datetime`; replace `proj:epsg` (int) with `proj:code` (text); move `eo:bands` to core `bands` (STAC 1.1); remove `file:values_regex`.
- Add deterministic SHA-256 `content_hash` to STAC items to track data changes across migrations.
- Add `pgstac_updated_at` column to items table as part of separating STAC property updates from database metadata updates.
- Deterministic Planetary Computer benchmark fixture manifest + fetch tooling for `naip`, `sentinel-2-l2a`, and `landsat-c2-l2` (1000 items per collection), plus CI/manual benchmark workflows that emit JSON/CSV/Markdown artifacts and branch comparison reports.

### Changed

Expand Down
2 changes: 1 addition & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -212,5 +212,5 @@ ON CONFLICT DO NOTHING;
### Loading test data

```bash
scripts/runinpypgstac --build loadsampledata
scripts/loadsampledata
```
10 changes: 8 additions & 2 deletions docker/pgstac/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@ RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
postgresql-contrib-$PG_MAJOR \
postgresql-$PG_MAJOR-pgtap \
postgresql-$PG_MAJOR-plpgsql-check \
postgresql-$PG_MAJOR-plprofiler \
plprofiler \
postgresql-$PG_MAJOR-partman \
postgresql-server-dev-$PG_MAJOR \
build-essential \
Expand All @@ -33,8 +35,9 @@ RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
&& make -C /tmp/pg_tle \
&& make -C /tmp/pg_tle install \
&& rm -rf /tmp/pg_tle \
&& sed -i "s/^#shared_preload_libraries = .*/shared_preload_libraries = 'pg_tle,pg_stat_statements,pg_cron'/" /usr/share/postgresql/$PG_MAJOR/postgresql.conf.sample \
&& sed -i "s/^#shared_preload_libraries = .*/shared_preload_libraries = 'pg_tle,pg_stat_statements,pg_cron'/" /usr/share/postgresql/postgresql.conf.sample \
&& sed -i 's/\.readfp(/.read_file(/' /usr/lib/python3/dist-packages/plprofiler/plprofiler_tool.py \
&& sed -i "s/^#shared_preload_libraries = .*/shared_preload_libraries = 'pg_tle,pg_stat_statements,pg_cron,plprofiler'/" /usr/share/postgresql/$PG_MAJOR/postgresql.conf.sample \
&& sed -i "s/^#shared_preload_libraries = .*/shared_preload_libraries = 'pg_tle,pg_stat_statements,pg_cron,plprofiler'/" /usr/share/postgresql/postgresql.conf.sample \
&& apt-get purge -y --auto-remove \
postgresql-server-dev-$PG_MAJOR \
build-essential \
Expand All @@ -46,6 +49,9 @@ RUN --mount=type=cache,target=/var/cache/apt,sharing=locked \
&& apt-get clean && apt-get -y autoremove \
&& rm -rf /var/lib/apt/lists/*

ENV EDITOR=/bin/true
ENV VISUAL=/bin/true

# The pgstacbase image with latest version of pgstac installed
FROM pgstacbase AS pgstac
WORKDIR /docker-entrypoint-initdb.d
Expand Down
1 change: 1 addition & 0 deletions docs/mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ extra:
nav:
- Home: "index.md"
- PgSTAC: "pgstac.md"
- Promoted Fields: "promoted-fields.md"
- pyPgSTAC: "pypgstac.md"
- Performance:
- item_size_analysis.ipynb
Expand Down
8 changes: 3 additions & 5 deletions docs/src/pgstac.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,12 +85,10 @@ Note that when pgstac.readonly is set to TRUE that pgstac is unable to use a cac

Runtime configuration of variables can be made with search by passing in configuration in the search json "conf" item.

Runtime configuration is available for **context**, **context_estimated_count**, **context_estimated_cost**, **context_stats_ttl**, and **nohydrate**.
Runtime configuration is available for **context**, **context_estimated_count**, **context_estimated_cost**, and **context_stats_ttl**.

The nohydrate conf item returns an unhydrated item bypassing the CPU intensive step of rehydrating data with data from the collection metadata. When using the nohydrate conf, the only fields that are respected in the fields extension are geometry and bbox.
```sql
SELECT search('{"conf":{"nohydrate"=true}}');
```
The legacy `conf.nohydrate` flag is still accepted in the request JSON for backward
compatibility, but split-storage search always returns hydrated items.
Comment on lines +90 to +91

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I reading this correctly, that hydration is always at the database layer now? If so, do we need to exercise this through stac-fastapi-pgstac to make sure this won't be a performance regression?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, but rather than being a flag to return an item collection that then needs to be blown apart and reconstructed, the new rustac pgstac (and python wrapper) functions will make a query that gets the raw rows (using functions coming in the next PR) and so those tools (which stac-fastapi-pgstac will need to be updated to use) will have a much faster and memory efficient path. we still maintain the search function though that returns the full item collection which will not have that option any longer though and so for things that use THAT function, yes, there might be a performance regression, but with that being said, the new hydration flow IS considerably faster than the old one.


#### PgSTAC Partitioning
By default PgSTAC partitions data by collection (note: this is a change starting with version 0.5.0). Each collection can further be partitioned by either year or month. **Partitioning must be set up prior to loading any data!** Partitioning can be configured by setting the partition_trunc flag on a collection in the database.
Expand Down
Loading