Skip to content

feat(analytics): make retention windows configurable#719

Merged
kriszyp merged 2 commits into
mainfrom
kris/peaceful-hodgkin-65230d
May 23, 2026
Merged

feat(analytics): make retention windows configurable#719
kriszyp merged 2 commits into
mainfrom
kris/peaceful-hodgkin-65230d

Conversation

@kriszyp
Copy link
Copy Markdown
Member

@kriszyp kriszyp commented May 22, 2026

Summary

  • Add analytics.rawRetentionMs and analytics.aggregateRetentionMs config params
  • Register them in CONFIG_PARAMS (hdbTerms.ts) and document in config-root.schema.json
  • startScheduledTasks() in resources/analytics/write.ts reads configured values (with existing constants as defaults)
  • Defaults unchanged — no behavior change on upgrade

Purpose

Closes #566 / Jira CORE-3074. A customer's system.mdb reached ~5 GB of analytics data (31M rows / 1 year of aggregates). Operators previously couldn't shorten the window without a code change.

Notable design decision

Raw retention is clamped to Math.max(configured, AGGREGATE_PERIOD) at startup. If the configured value is shorter than the aggregation cadence, raw records could be deleted before aggregation() rolls them up — a silent data loss path flagged in cross-model review. The schema description notes this behavior.

🤖 Generated by Claude on behalf of Kris.

Add analytics.rawRetentionMs and analytics.aggregateRetentionMs config params
to let operators age out hdb_analytics data faster than the hardcoded defaults
(1h raw / 1y aggregate). Defaults are unchanged so existing deployments are
unaffected.

Raw retention is clamped to at least the aggregation period to prevent raw
records from being deleted before they can be rolled up (avoids silent data loss
when the value is set below the aggregation cadence).

Closes: #566
Jira: CORE-3074

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@kriszyp kriszyp requested review from heskew and kylebernhardy May 22, 2026 05:02
// Clamp raw retention to at least one full aggregation period so raw records
// are never deleted before they can be rolled up.
const rawRetentionMs = Math.max(envGet(CONFIG_PARAMS.ANALYTICS_RAWRETENTIONMS) ?? RAW_EXPIRATION, AGGREGATE_PERIOD);
const aggregateRetentionMs = envGet(CONFIG_PARAMS.ANALYTICS_AGGREGATERETENTIONMS) ?? AGGREGATE_EXPIRATION;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aggregateRetentionMs: 0 silently deletes all aggregate data.

cleanup() computes end = Date.now() - expiration. If aggregateRetentionMs is 0, every cleanup cycle removes every aggregate record (end = Date.now() → all keys qualify). The schema allows minimum: 0 without documenting this semantics, so an operator who sets 0 expecting "disabled" instead loses all historical aggregates.

rawRetentionMs is protected by the Math.max(…, AGGREGATE_PERIOD) clamp, but aggregateRetentionMs has no guard.

Suggested fix — skip cleanup when the configured value is zero:

Suggested change
const aggregateRetentionMs = envGet(CONFIG_PARAMS.ANALYTICS_AGGREGATERETENTIONMS) ?? AGGREGATE_EXPIRATION;
const aggregateRetentionMs = envGet(CONFIG_PARAMS.ANALYTICS_AGGREGATERETENTIONMS) ?? AGGREGATE_EXPIRATION;
if (!aggregateRetentionMs) return; // 0 = disabled

Or, if full deletion is the intended "purge" semantics for 0, document it explicitly in the schema description and raise the minimum to a meaningful floor (e.g., 1).

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 22, 2026

Reviewed; no blockers found. Prior blocker (aggregateRetentionMs=0 mass-delete) addressed by commit fe8f8b0.

…lete

Setting aggregateRetentionMs to 0 would pass Date.now()-0=Date.now() to
cleanup(), deleting every aggregate record. Treat 0 as "keep forever"
to match the storageInterval: 0 = disabled convention.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@kriszyp
Copy link
Copy Markdown
Member Author

kriszyp commented May 22, 2026

Fixed the blocker in the follow-up commit: aggregateRetentionMs: 0 now skips aggregate cleanup entirely (keep forever), matching the storageInterval: 0 = disabled convention already in the codebase. Schema description updated accordingly.

The Windows Integration Tests 1/4 failure (transaction-logs.test.mjs / "Error initializing log rotator") is pre-existing and reproduces on main — not related to this change.

🤖 — Claude

@kriszyp kriszyp marked this pull request as ready for review May 22, 2026 05:22
@kriszyp kriszyp requested review from cb1kenobi and removed request for kylebernhardy May 22, 2026 11:28
@kriszyp kriszyp merged commit 6b33994 into main May 23, 2026
38 checks passed
@kriszyp kriszyp deleted the kris/peaceful-hodgkin-65230d branch May 23, 2026 03:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make hdb_analytics retention windows configurable

2 participants