Postgres storage: bucket_parameters DISTINCT ON lookup falls back to seq scan/sort under many parameterized buckets

## Summary

We're seeing a repeatable initial checkpoint delay on self-hosted PowerSync with Postgres storage and many parameterized bucket definitions.

The hot query shape in the storage database is a `SELECT DISTINCT ON (...)` against `powersync.bucket_parameters`. On our staging deployment, the default planner path fell back to a large scan + sort, and a local composite index changed the plan from ~384ms to ~33ms for the same representative lookup.

This did not eliminate all startup latency on its own, but it materially improved the specific `bucket_parameters` lookup path and looks like something PowerSync may want to handle upstream in the storage schema or query shape.

## Environment

- PowerSync service: `powersync-js/1.48.0` clients hitting self-hosted `powersync-service`
- Storage backend: Postgres (`powersync_storage`)
- Active sync rule: 1
- Bucket definitions: 89 total, 78 parameterized

## Observed runtime pattern

On the service side, the repeated timing pattern was:

- `Sync stream started`
- ~5.5s to ~6.5s later: `New checkpoint`

The active checkpoint log included:

- `buckets: 988`
- `param_results: 975`

That suggested bucket-parameter fan-out, so we inspected the storage DB.

## Dominant storage query

From `pg_stat_statements` on the storage DB, the dominant lookup was:

```sql
SELECT DISTINCT ON (lookup, source_table, source_key)
  lookup, source_table, source_key, id, bucket_parameters
FROM bucket_parameters
WHERE group_id = $1
  AND lookup = ANY (...)
  AND id <= $3
ORDER BY lookup, source_table, source_key, id DESC
```

Representative aggregated stats before the workaround:

- calls: `47073`
- mean_exec_time: `56.1ms`
- max_exec_time: `1206.0ms`

Because the active rule had 89 bucket definitions (78 parameterized), that fan-out alone was already enough to explain several seconds of checkpoint setup time.

## Representative EXPLAIN before index

For one real live lookup key, `EXPLAIN (ANALYZE, BUFFERS)` showed:

```text
Unique  (actual time=275.878..382.311 rows=480 loops=1)
  ->  Gather Merge
        ->  Sort
              Sort Method: external merge  Disk: ~7MB per worker
              ->  Parallel Seq Scan on bucket_parameters
Execution Time: 384.211 ms
```

Additional shape details:

- raw rows matched: `102894`
- distinct `(source_table, source_key)` pairs returned: `480`
- temp spill occurred

Existing indexes on `powersync.bucket_parameters` were:

- `(group_id, lookup, id DESC)`
- `(group_id, source_table, source_key)`

Even with `enable_seqscan = off`, Postgres still chose a broad bitmap path + sort instead of a path that matched the full dedup/order shape.

## Local workaround index

We added this index locally as an operational workaround:

```sql
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_bucket_parameters_lookup_source_latest
  ON powersync.bucket_parameters (group_id, lookup, source_table, source_key, id DESC);
```

## Representative EXPLAIN after index

The same representative lookup changed to:

```text
Result  (actual time=0.032..32.928 rows=480 loops=1)
  ->  Unique
        ->  Index Only Scan using idx_bucket_parameters_lookup_source_latest
Execution Time: 32.999 ms
```

Notable differences:

- execution time: `384ms -> 33ms`
- no external merge sort
- no parallel seq scan
- index-only scan chosen

## Important caveat

This did **not** eliminate all initial checkpoint latency for us. There is still residual startup work after this lookup path is improved.

So this report is not claiming "this one index fixes all startup delay".

What it does show is:

1. `bucket_parameters` lookups can dominate startup wall clock through fan-out even when no single query looks catastrophically slow in isolation.
2. The current default index shape on `bucket_parameters` does not support this `DISTINCT ON ... ORDER BY ... id DESC` access pattern well enough on our dataset.
3. A composite index on `(group_id, lookup, source_table, source_key, id DESC)` materially improves that hot path.

## Question

Would PowerSync accept either:

- an upstream storage-schema index along these lines, or
- a query/storage change that avoids the large scan + sort path for `bucket_parameters` lookups under many parameterized buckets?

Happy to provide more detail if useful.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Postgres storage: bucket_parameters DISTINCT ON lookup falls back to seq scan/sort under many parameterized buckets #613

Summary

Environment

Observed runtime pattern

Dominant storage query

Representative EXPLAIN before index

Local workaround index

Representative EXPLAIN after index

Important caveat

Question

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Postgres storage: bucket_parameters DISTINCT ON lookup falls back to seq scan/sort under many parameterized buckets #613

Description

Summary

Environment

Observed runtime pattern

Dominant storage query

Representative EXPLAIN before index

Local workaround index

Representative EXPLAIN after index

Important caveat

Question

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions