Skip to content

Add aggregation flags to reduce cardinality in high-database deployments#29

Merged
tom-pang merged 2 commits intomainfrom
maxeng-disable-metrics
Apr 14, 2026
Merged

Add aggregation flags to reduce cardinality in high-database deployments#29
tom-pang merged 2 commits intomainfrom
maxeng-disable-metrics

Conversation

@maxenglander
Copy link
Copy Markdown
Collaborator

Problem

postgres_exporter is OOMing on high-cardinality Postgres clusters with many databases:

  • Incident: OOMKilled at 128Mi memory limit on a cluster with 692 databases
  • Root cause: Collecting 22,144 time series but only using ~15 in production
  • Current workaround: Increased memory limit to 256Mi (temporary fix)

Cardinality Breakdown (692 databases)

Collector Metrics per DB Total Series
pg_locks 9 lock modes 6,228
pg_stat_database 18 metrics 12,456
Other collectors - ~3,460
Total 22,144

Solution

Add two new flags to allow aggregation/filtering at collection time:

1. --collector.locks.per_database (default: false)

  • When false (default): Aggregates lock counts across all databases
    • Reduces pg_locks_count from 6,228 series → 9 series (99.9% reduction)
    • Aggregates by mode only (removes datname label)
  • When true: Original behavior (per-database, per-mode metrics)

Rationale: Our recording rules already aggregate away the datname label:

sum by (cluster, database_branch_id, mode, ...) (pg_locks_count)

So per-database collection was unnecessary cardinality.

2. --collector.stat_database.detailed (default: false)

  • When false (default): Only collects xact_commit metric
    • Reduces from 12,456 series → 692 series (94.4% reduction)
    • Preserves datid and datname labels (needed by recording rules)
  • When true: Original behavior (all 18 metrics)

Rationale: Only xact_commit is used in production metrics. Other metrics (blks_read, tup_fetched, etc.) are unused but consuming memory.

Implementation Details

Changes

  • collector/pg_locks.go:

    • Added flag definition
    • Split Update() into updatePerDatabase() and updateAggregated()
    • New aggregated query removes datname from SELECT and JOIN
  • collector/pg_stat_database.go:

    • Added flag definition
    • Conditional column selection based on flag
    • Split Update() into updateDetailed() and updateMinimal()
  • Tests: Added test coverage for both modes of each collector

  • README: Documented new flags

Backward Compatibility

Fully backward compatible - users can restore original behavior:

--collector.locks.per_database=true \
--collector.stat_database.detailed=true

Expected Impact

Cardinality Reduction (692 databases)

Metric Before After (defaults) Reduction
pg_locks_count 6,228 9 -99.9%
pg_stat_database_* 12,456 692 -94.4%
Total 22,144 701 -96.8%

Memory Savings

  • Before: 256Mi (with override to prevent OOM)
  • Expected after: <64Mi
  • Savings: ~75% reduction in memory usage

Recording Rules

No changes needed - recording rules will continue to work:

  • pg_locks: Already aggregates away datname
  • pg_stat_database: Preserves datname for downstream use

Testing

  • ✅ All unit tests pass
  • ✅ New tests added for both aggregated and detailed modes
  • ✅ Code compiles successfully
  • 🔲 Integration testing in staging environment (next step)

Deployment Plan

  1. Phase 1: Deploy to dev/staging turtle with defaults
  2. Phase 2: Monitor memory usage and recording rules for 48 hours
  3. Phase 3: Gradual rollout to production turtles
  4. Phase 4: Remove 256Mi memory override if stable

Related

  • Addresses memory issues in multi-tenant Postgres deployments
  • Aligns collection with actual usage (don't collect unused metrics)
  • Sets precedent for future cardinality reduction efforts

Add two new flags to significantly reduce memory usage and time series
cardinality in high-database-count deployments:

- `--collector.locks.per_database` (default: false): When false,
  aggregates lock counts across all databases instead of per-database
- `--collector.stat_database.detailed` (default: false): When false,
  only collects xact_commit instead of all 18 metrics

These changes reduce time series from 22,144 to 701 (96.8% reduction)
on a 692-database cluster, resolving OOM issues.

Backward compatibility maintained: set both flags to true to restore
original behavior.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Max Englander <max@planetscale.com>
@maxenglander maxenglander marked this pull request as ready for review April 3, 2026 22:21
@maxenglander maxenglander requested a review from a team as a code owner April 3, 2026 22:21
Copy link
Copy Markdown

@tom-pang tom-pang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pgLocksDesc is being allocated via prometheus.NewDesc() inside updatePerDatabase() and updateAggregated(), which means it runs on every scrape. Every other collector in the codebase defines descriptors as package-level vars.

Since the two modes need different label sets (["datname", "mode"] vs ["mode"]), two separate descriptors makes sense — they just need to be hoisted to package scope:

var (
    pgLocksDescPerDB = prometheus.NewDesc(
        prometheus.BuildFQName(namespace, locksSubsystem, "count"),
        "Number of locks per database",
        []string{"datname", "mode"}, nil,
    )
    pgLocksDescAggregated = prometheus.NewDesc(
        prometheus.BuildFQName(namespace, locksSubsystem, "count"),
        "Number of locks across all databases",
        []string{"mode"}, nil,
    )
)

Signed-off-by: Max Englander <max@planetscale.com>
@tom-pang tom-pang merged commit 3e8c45c into main Apr 14, 2026
8 checks passed
@tom-pang tom-pang deleted the maxeng-disable-metrics branch April 14, 2026 16:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants