Skip to content

feat(admin): decouple metric retention window from scrape interval#93

Open
arnaugiralt wants to merge 3 commits intomasterfrom
feat/admin-retention-window
Open

feat(admin): decouple metric retention window from scrape interval#93
arnaugiralt wants to merge 3 commits intomasterfrom
feat/admin-retention-window

Conversation

@arnaugiralt
Copy link
Copy Markdown
Member

Bumps the admin metric ring buffer to actually span the configured retention window. Adds a scraper.retention_window config (defaults to 1h) and sizes the ring from window/interval at startup so changing the scrape interval no longer silently shrinks the chart history.

The per-instance ring buffer was sized at a fixed 360 samples, so lowering
scraper.interval silently shrank the visible history (3s interval gave
~18 min instead of the 1h target). Add scraper.retention_window
(default 1h, env CHAPERONE_ADMIN_SCRAPER_RETENTION_WINDOW) and compute
ring capacity at startup so it actually spans the requested window —
ceil(window/interval) + 1, since N snapshots span (N-1)*interval. Floor
at 2 so degenerate inputs always leave a usable rate pair. Validation
rejects retention_window < interval so the mismatch surfaces at load.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@arnaugiralt arnaugiralt requested a review from victor-cuevas May 7, 2026 20:14
victor-cuevas
victor-cuevas previously approved these changes May 8, 2026
Copy link
Copy Markdown
Collaborator

@qarlosh qarlosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two findings — see inline.

Comment thread admin/metrics/metrics.go
// covers at least `window`. Always returns at least 2 — the minimum needed
// for a single rate pair — so degenerate inputs never produce an unusable
// collector.
func CapacityFor(window, interval time.Duration) int {
Copy link
Copy Markdown
Collaborator

@qarlosh qarlosh May 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out-of-diff observation: historicalPair in admin/metrics/collector.go:265,269 still hardcodes the trend window (50*time.Minute, -1*time.Hour). Side effect of the new flag — with retention_window < 1h (now valid) trends silently never render; with > 1h they still compare against ~1h ago.

Treating the trend window as an independent product decision seems reasonable to me — leaving this note for the record in case it's worth revisiting later (couple trends to retention, or tighten validation). Non-blocking.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch. tricky thing, I ended up coupling trend window to retention window to keep configuration simple

Comment thread admin/config/loader.go

parseDuration(&cfg.Scraper.Interval, "SCRAPER_INTERVAL", &errs)
parseDuration(&cfg.Scraper.Timeout, "SCRAPER_TIMEOUT", &errs)
parseDuration(&cfg.Scraper.RetentionWindow, "SCRAPER_RETENTION_WINDOW", &errs)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs in docs/guides/admin-portal.md are now out of sync:

  • L52–54: YAML example missing retention_window.
  • L81–87: env var table missing CHAPERONE_ADMIN_SCRAPER_RETENTION_WINDOW.
  • L175: states "The portal retains 360 scrape snapshots per instance ... at 10s intervals is exactly 1 hour of history" — exactly what this PR changes; now misleading.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

arnaugiralt and others added 2 commits May 8, 2026 17:35
Add retention_window to the YAML example and env var table, and rewrite
the history-retention bullet to reflect that capacity is now computed
from scraper.retention_window and scraper.interval at startup.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
historicalPair previously hardcoded a 50-minute readiness threshold and a
1-hour lookback. With retention_window now configurable, this caused
trends to silently never render under 1h and to ignore the configured
window above 1h. Plumb the trend window through Collector and derive both
the threshold and the lookback from it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants