Skip to content

feat: PostgreSQL partitioning support for multi-tenant data isolation#235

Merged
intel352 merged 6 commits intomainfrom
copilot/add-postgresql-list-partitioning
Mar 3, 2026
Merged

feat: PostgreSQL partitioning support for multi-tenant data isolation#235
intel352 merged 6 commits intomainfrom
copilot/add-postgresql-list-partitioning

Conversation

Copy link
Contributor

Copilot AI commented Mar 3, 2026

  • Initial implementation of PostgreSQL LIST partitioning
  • Fix appendTenantFilter to handle ORDER BY / LIMIT / GROUP BY / HAVING clauses properly
  • Reject tenantKey for INSERT statements in step.db_exec
  • Remove "postgresql" from isSupportedPartitionDriver for consistency
  • Fix schema tenantKey placeholders to use "steps." prefix consistently
  • Add configurable partitionNameFormat for dynamic table name resolution
  • Add partitionType config to distinguish LIST vs RANGE partitioning
  • Add auto-partition creation based on source table (sourceTable/sourceColumn config)
  • Add step.db_sync_partitions for syncing partitions from source table
  • Handle RANGE partition edge cases and lifecycle scenarios
  • Update tests for all changes
  • Resolve merge conflicts with main branch (allow_dynamic_sql, auth.m2m custom claims)
Original prompt

This section details on the original issue you should resolve

<issue_title>Feature: PostgreSQL LIST partitioning support for multi-tenant data isolation</issue_title>
<issue_description>## Problem

In multi-tenant applications, PostgreSQL LIST partitioning by tenant/affiliate ID is a common pattern for data isolation:

CREATE TABLE forms (
    id UUID PRIMARY KEY,
    tenant_id TEXT NOT NULL,
    slug TEXT NOT NULL,
    ...
) PARTITION BY LIST (tenant_id);

CREATE TABLE forms_org_alpha PARTITION OF forms FOR VALUES IN ('org-alpha');
CREATE TABLE forms_org_beta  PARTITION OF forms FOR VALUES IN ('org-beta');

This provides strong data isolation guarantees (tenant A's queries physically cannot touch tenant B's partition) and enables independent per-tenant maintenance operations.

Currently, there is no engine-level support for this pattern. Applications must handle partition management themselves, and pipeline steps like step.db_query / step.db_exec require manual WHERE tenant_id = $X clauses in every query.

Proposed Solution

1. Partition-aware module type: database.partitioned

A module that automatically manages LIST partitions when new tenants are onboarded:

- name: db
  type: database.partitioned
  config:
    driver: pgx
    dsn: "${DB_DSN}"
    partition_key: tenant_id
    tables:
      - forms
      - form_submissions
      - follow_ups

When a new tenant is registered, the module creates partitions for all configured tables.

2. Automatic tenant scoping in step.db_query / step.db_exec

An optional tenant_key config that automatically appends AND tenant_id = $N to queries:

- name: fetch
  type: step.db_query
  config:
    database: db
    tenant_key: auth.tenant_id   # from JWT claim or pipeline context
    query: "SELECT * FROM forms WHERE active = true ORDER BY created_at DESC"

This would reduce boilerplate and eliminate the risk of forgetting the tenant filter on a query.

Use Case

  1. Multi-tenant SaaS where each organization's data must be physically isolated
  2. New tenants are onboarded dynamically — partitions must be created at runtime
  3. Every data query must be scoped to the requesting tenant's partition
  4. Currently requires custom step types for partition management and manual WHERE clauses in every pipeline query

Benefits

  • Security: Automatic tenant scoping eliminates accidental cross-tenant data leaks from missing WHERE clauses
  • Performance: PostgreSQL partition pruning ensures queries only scan the relevant tenant's data
  • Maintenance: Per-partition operations (VACUUM, index rebuilds) don't lock other tenants
  • Simplicity: Application developers don't need to remember tenant filtering on every query</issue_description>

Comments on the Issue (you are @copilot in this section)


🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

Copilot AI and others added 2 commits March 3, 2026 04:05
…isolation

- Add `database.partitioned` module type with PartitionedDatabase, PartitionKeyProvider
  and PartitionManager interfaces for managing PostgreSQL LIST partitions
- Add `tenantKey` config to `step.db_query` and `step.db_exec` for automatic
  tenant scoping via PartitionKeyProvider
- Add `step.db_create_partition` step for idempotent runtime partition creation
- Add `appendTenantFilter` helper to sql_placeholders.go
- Register all new types in storage plugin, pipelinesteps plugin, schema registry,
  coreModuleTypes/coreStepTypes, and type_registry
- Add tests for new functionality

Co-authored-by: intel352 <77607+intel352@users.noreply.github.com>
…teps

Co-authored-by: intel352 <77607+intel352@users.noreply.github.com>
Copilot AI changed the title [WIP] Add PostgreSQL LIST partitioning support for multi-tenant data isolation feat: PostgreSQL LIST partitioning support for multi-tenant data isolation Mar 3, 2026
@intel352 intel352 marked this pull request as ready for review March 3, 2026 04:22
Copilot AI review requested due to automatic review settings March 3, 2026 04:22
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds first-class PostgreSQL LIST partitioning support to the workflow engine to enable multi-tenant isolation, including runtime partition creation and optional automatic tenant scoping in DB pipeline steps.

Changes:

  • Introduces database.partitioned module with PartitionKeyProvider / PartitionManager and idempotent EnsurePartition.
  • Adds step.db_create_partition and enhances step.db_query / step.db_exec with optional tenantKey scoping.
  • Registers new module/step types across schema + plugins + wfctl type registry and adds/updates tests.

Reviewed changes

Copilot reviewed 16 out of 21 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
schema/snippets_export.go Formatting-only alignment adjustments.
schema/schema.go Adds new core module/step types to schema type lists.
schema/module_schema.go Registers schema for database.partitioned, adds tenantKey fields, adds step.db_create_partition schema.
plugins/storage/plugin_test.go Updates expected module type/schema counts for storage plugin.
plugins/storage/plugin.go Registers database.partitioned module type + factory + module schema in storage plugin.
plugins/pipelinesteps/plugin_test.go Adds step.db_create_partition to expected steps list.
plugins/pipelinesteps/plugin.go Registers step.db_create_partition in step types + factories.
module/sql_placeholders.go Adds appendTenantFilter helper used for tenant scoping.
module/pipeline_step_token_revoke_test.go Removes trailing blank line (formatting).
module/pipeline_step_sandbox_exec.go Formatting-only alignment changes.
module/pipeline_step_db_tenant_test.go New tests for tenantKey scoping and appendTenantFilter.
module/pipeline_step_db_query_test.go Formatting-only alignment change.
module/pipeline_step_db_query.go Adds tenantKey support and applies tenant filter + extra param.
module/pipeline_step_db_exec.go Adds tenantKey support and applies tenant filter + extra param.
module/pipeline_step_db_create_partition.go New step.db_create_partition implementation.
module/kafka_broker.go Import ordering + formatting alignment.
module/http_server.go Formatting-only alignment changes.
module/database_partitioned_test.go New tests for partitioned database behaviors + create-partition step.
module/database_partitioned.go New database.partitioned module implementation.
module/cache_redis.go Formatting-only alignment change.
cmd/wfctl/type_registry.go Registers new module/step types + config keys in CLI registry.

Comment on lines +180 to +187
// isSupportedPartitionDriver returns true for PostgreSQL-compatible drivers.
func isSupportedPartitionDriver(driver string) bool {
switch driver {
case "pgx", "pgx/v5", "postgres", "postgresql":
return true
}
return false
}
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isSupportedPartitionDriver treats "postgresql" as supported, but elsewhere the codebase (e.g. database.workflow schema/options and sql placeholder normalization) only recognizes "postgres"/"pgx"/"pgx/v5" as Postgres driver names. Unless there is a real sql driver registered under "postgresql" in this repo, consider removing it here (and in the test) or adding it consistently across driver detection + schema options.

Copilot uses AI. Check for mistakes.
Comment on lines +106 to +125
// Apply automatic tenant scoping when tenantKey is configured.
query := s.query
if s.tenantKey != "" {
pkp, ok := svc.(PartitionKeyProvider)
if !ok {
return nil, fmt.Errorf("db_exec step %q: tenantKey requires database %q to implement PartitionKeyProvider (use database.partitioned)", s.name, s.database)
}
partKey := pkp.PartitionKey()
if err := validateIdentifier(partKey); err != nil {
return nil, fmt.Errorf("db_exec step %q: invalid partition key %q: %w", s.name, partKey, err)
}
tenantVal := resolveBodyFrom(s.tenantKey, pc)
if tenantVal == nil {
return nil, fmt.Errorf("db_exec step %q: tenantKey %q resolved to nil in pipeline context", s.name, s.tenantKey)
}
tenantStr := fmt.Sprintf("%v", tenantVal)
nextParam := len(resolvedParams) + 1
query = appendTenantFilter(query, partKey, nextParam)
resolvedParams = append(resolvedParams, tenantStr)
}
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tenantKey scoping is applied to all db_exec statements by blindly appending WHERE/AND. This will generate invalid SQL for INSERT statements (and for any statement types that don't support WHERE). Consider either rejecting tenantKey for non-UPDATE/DELETE statements (with a clear error) or implementing statement-aware handling (e.g., require the tenant column be part of INSERT values instead of appending a filter).

Copilot uses AI. Check for mistakes.
Comment on lines +178 to +185
func TestAppendTenantFilter_TrailingWhitespace(t *testing.T) {
query := "SELECT * FROM forms ORDER BY created_at "
result := appendTenantFilter(query, "tenant_id", 1)
expected := "SELECT * FROM forms ORDER BY created_at WHERE tenant_id = $1"
if result != expected {
t.Errorf("expected %q, got %q", expected, result)
}
}
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test currently codifies behavior that generates invalid SQL (placing WHERE after ORDER BY). Once tenant scoping is supported for typical SELECTs with ORDER BY/LIMIT, appendTenantFilter should insert the tenant predicate before ORDER BY and this expectation should be updated accordingly.

Copilot uses AI. Check for mistakes.
@intel352
Copy link
Contributor

intel352 commented Mar 3, 2026

@copilot apply changes based on the comments in this thread

  1. When using table partitions, it would be good to optionally support a dynamic table name format. Ideally an option like this would be set once in the workflow config, to generate a table name with a predetermined format. So we'd define something like {tableName}_{tenantId} or {tenantId}_{tableName} and call a helper method that would resolve the table's tenant partition at query time. This could be an alternative to auto-appending an AND criteria to the WHERE statement, since individual table partitions inherently have an AND built in for the partition id.

  2. We need ability to generate partitions automatically, based on a value existing in the partition table source. i.e. - if we add a new tenant to the tenants table, then every table that's partitioned should have a migration step occur to introduce the new partition.

  3. Think about whether there's any edge cases or other lifecycle scenarios for table partitions that we need to consider. The scenarios I've mentioned above are specific to LIST partitions, but postgres also supports RANGE partitions, do we need to be mindful of that? If we add any automagic functionality specific to LIST partitions, do we have to distinguish between LIST and RANGE partitioned tables to ensure they're treated differently? Go ahead and handle any edge scenarios that you come up with.

- Fix appendTenantFilter to insert tenant predicate before ORDER BY/
  LIMIT/GROUP BY/HAVING/OFFSET clauses instead of blindly appending
- Reject tenantKey for INSERT statements in step.db_exec with clear error
- Remove "postgresql" from isSupportedPartitionDriver for consistency
- Fix schema tenantKey placeholders to use "steps." prefix consistently
- Add partitionType config (list/range) with RANGE partition DDL support
- Add partitionNameFormat config ({table}_{tenant}, {tenant}_{table}, etc.)
- Add PartitionTableName method to PartitionKeyProvider interface
- Add sourceTable/sourceColumn config for auto-partition sync
- Add SyncPartitionsFromSource method to PartitionManager interface
- Add step.db_sync_partitions for triggering partition sync from source
- Add comprehensive tests for all new functionality

Co-authored-by: intel352 <77607+intel352@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Mar 3, 2026

@copilot apply changes based on the comments in this thread

  1. When using table partitions, it would...

All three requests addressed in commit d440278:

1. Dynamic table name format — Added partitionNameFormat config with {table} and {tenant} placeholders (default: {table}_{tenant}). Can be set to {tenant}_{table} or any other combination. The PartitionKeyProvider interface now includes PartitionTableName(parentTable, tenantValue) for runtime resolution. Example:

config:
  partitionNameFormat: "{tenant}_{table}"

2. Auto-partition from source table — Added sourceTable and sourceColumn config. SyncPartitionsFromSource() queries SELECT DISTINCT <column> FROM <sourceTable> and calls EnsurePartition for each value. New step.db_sync_partitions triggers this from a pipeline:

config:
  sourceTable: tenants
  sourceColumn: id

3. RANGE partition support — Added partitionType config (list or range). LIST uses FOR VALUES IN ('value'), RANGE uses FOR VALUES FROM ('value') TO ('value\x00'). "postgresql" removed from supported drivers for consistency. INSERT statements with tenantKey are now rejected with a clear error. appendTenantFilter now correctly inserts predicates before ORDER BY/LIMIT/GROUP BY/HAVING clauses.

Copilot AI changed the title feat: PostgreSQL LIST partitioning support for multi-tenant data isolation feat: PostgreSQL partitioning support for multi-tenant data isolation Mar 3, 2026
@intel352
Copy link
Contributor

intel352 commented Mar 3, 2026

@copilot Address the merge conflict on the branch

Merge features from main (allow_dynamic_sql, auth.m2m custom claims)
with our partitioning features (tenantKey, database.partitioned,
step.db_create_partition, step.db_sync_partitions).

Both db_query and db_exec now support both allow_dynamic_sql and
tenantKey simultaneously. Schema includes all fields from both
feature branches.

Co-authored-by: intel352 <77607+intel352@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Mar 3, 2026

@copilot Address the merge conflict on the branch

In commit 6a2a32d, I've incorporated all changes from main into our branch files — allow_dynamic_sql (PR #234), auth.m2m custom claims (PR #236), and the new pipeline_step_db_dynamic.go helper. Both db_query and db_exec now support allowDynamicSQL and tenantKey together. All tests pass.

However, I'm unable to resolve the Git-level merge conflict directly because I cannot run git merge + git commit or git rebase + git push --force from this environment. You'll need to merge main into this branch from the GitHub UI — the conflicts should be trivially resolvable since both sides now have matching content (just accept "ours" for each conflict).

@intel352 intel352 merged commit b5beb8c into main Mar 3, 2026
14 checks passed
@intel352 intel352 deleted the copilot/add-postgresql-list-partitioning branch March 3, 2026 08:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature: PostgreSQL LIST partitioning support for multi-tenant data isolation

3 participants