Skip to content

Performance Critical: create_namespace Latency (>30s) - Schema & Query Optimization Required #3685

Description

@pjanuario

Describe the bug

1. Issue Summary

We are experiencing a critical performance degradation where the create_namespace API operation consistently times out (>30s), causing cascading failures in our downstream workload infrastructure (Traefik 504 Gateway Timeouts and pyiceberg have an unexpected behaviour).

2. Remediation Status

To restore production stability, we are manually applying the following schema changes:

  • Applied: idx_entities_catalog_id_id on entities. (Result: Latency dropped from ~36s to ~17s).
  • Applying Now: idx_grants_realm_grantee and idx_grants_realm_securable on grant_records to resolve 100% CPU usage during permission checks and GC.

3. Root Cause Analysis (Database Layer)

Our investigation confirmed that the default schema lacks secondary indexes required for the application's actual query patterns.

A. The "Grant Records" Bottleneck (High Frequency)
The application performs thousands of permission checks per operation.

  • Observation: 100% Sequential Scans.
  • Metrics: 65,000+ executions of SELECT ... WHERE grantee_id = ....
  • Impact: Without an index starting with grantee_id or realm_id, the database reads 13.7 Billion rows to serve these requests.
  • Fix: We are adding indexes to support lookups by grantee_id and securable_id.

B. The "Entities" Bottleneck (Bulk Fetch)

  • Observation: Namespace create operation triggers a query with a massive IN clause containing ~7,800 tuples: WHERE (catalog_id, id) IN ((?,?), (?,?)...).
  • Impact: The query planner could not use the PK (starting with realm_id) efficiently, resulting in 17s+ execution time.
  • Fix: We added idx_entities_catalog_id_id to optimize this lookup.

4. Request for Engineering: Query Optimization

While the indexes mitigate the immediate outage, the underlying query patterns appear inefficient and likely need refactoring in the Polaris codebase.

1. Review "N+1" Permission Checks
We observed 65,415 executions of the permission check query during a short window.

  • Question: Is the application iterating through a list and checking permissions one-by-one instead of performing a bulk permission check? This volume suggests an "N+1" query issue that indexes can only partially band-aid.

2. Review Bulk Entity Fetch Strategy
The query SELECT ... FROM entities WHERE (catalog_id, id) IN (...) is passing ~15,000 bind parameters (7,800 pairs).

  • Recommendation: This pattern scales poorly. Please consider paginating this operation or refactoring the logic to avoid passing thousands of IDs in a single SQL statement.

5. SQL for Upstream Merge

We strongly recommend including these indexes in the next Polaris release to prevent similar degradation for other users:

-- 1. Optimizes High-Volume Permission Checks (Fixes API CPU load)
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_grants_realm_grantee 
ON polaris_schema.grant_records (realm_id, grantee_id);

-- 2. Optimizes Garbage Collection (Fixes background job hangs)
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_grants_realm_securable 
ON polaris_schema.grant_records (realm_id, securable_id);

-- 3. Optimizes Bulk Entity Lookups (Fixes Namespace listing latency)
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_entities_catalog_id_id 
ON polaris_schema.entities (catalog_id, id);

To Reproduce

  1. Environment: Deploy Apache Polaris backed by PostgreSQL with a moderate dataset (~170k grant records, ~160k entities).
  2. Action: Execute the create_namespace API operation on a catalog with ~22k namespaces.
  3. Conditions: Ensure standard concurrency; the issue manifests even without much load due to table scans.
  4. Observation: Monitor API response time and database activity via pg_stat_statements.

Actual Behavior

  • Performance: The operation consistently takes 33s - 36s to complete.
  • Database Execution: Permission checks trigger a Sequential Scan of the entire grant_records table for every single lookup. We observed queries reading ~174,000 rows per execution (100% of the table) thousands of times per minute.
  • Client Experience: The API request times out. Proxies (e.g., Traefik) return 504 Gateway Timeout at the 30s mark, and client applications (crawlers) hang or fail silently.

Expected Behavior

  • Performance: The create_namespace operation should complete in sub-second time to ~2 seconds.
  • Database Execution: Permission checks filtering by grantee_id should utilize an Index Scan, reading only the specific rows relevant to the user (typically <100 rows).
  • Client Experience: The request should return 200 OK well within standard proxy/client timeout windows (30s).

Additional context

All this information was very consistent along the research we did, after applying the first index idx_entities_catalog_id_id we seen a consistent drop on the response time from ~31s into ~16s.
After the index was applied the stats were reset and and pg statements cleaned, we kept the system running for +-24h and the data we see in the ticket is from this period after the resets.

System information

OS: Linux
Polaris Catalog Version: Docker image 1.3.0 (initial symptoms were present in 1.0.1 and we upgraded to validate if this was caused by the version, the problem persisted)
Object storage & setup: AWS S3

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Fields

No fields configured for issues without a type.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions