Skip to content

feat: add converge package for push-based metric stabilization#25

Merged
davidbirdsong merged 4 commits into
developfrom
davidbirdsong/net-146-switch-cloudflare-exporter-from-pull-based-scraping-to-push
Jun 5, 2026
Merged

feat: add converge package for push-based metric stabilization#25
davidbirdsong merged 4 commits into
developfrom
davidbirdsong/net-146-switch-cloudflare-exporter-from-pull-based-scraping-to-push

Conversation

@davidbirdsong

Copy link
Copy Markdown

Pull Request Template

Description

Adds converge/, a pure state machine engine for push-based metric stabilization. It watches a stream of timestamped observations, detects when values stop changing, and emits push-ready samples with correct source timestamps.

This is the core building block for migrating zone traffic metrics from Prometheus scrape (with 4+ minute scrape_delay) to Victoria Metrics push with accurate timestamps.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Code refactoring
  • Other (please describe):

Testing

  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Code Quality

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation

Additional Notes

The package separates into a functional core and an imperative shell:

  • Engine (pure, no I/O): Ingest([]Observation) → []Sample, Expire(now) → []Sample, Flush() → []Sample
  • Runner (thin loop): wires a Fetcher and Sink to the engine on a tick, handles live/backfill priority and graceful shutdown
  Fetcher ──► Runner ──► Engine ──► Runner ──► Sink
  (CF API)    (loop)     (state)    (loop)     (VM push)

Internally the engine manages windows (one per source time bucket), each containing trackers (one per metric series). A tracker counts consecutive identical uint64 values and signals when the configured threshold is crossed.

Key decisions

  • Eager push, correct later. Threshold=1 pushes on first observation. Threshold=3 waits for convergence. Config knob, not a code change.
  • Backfill rate limiting. Backfill is capped at N Fetch calls per tick (default: 1). Live lane always runs first with no cap. Bounds API consumption without starving freshness.
  • No special backfill code path. Old data flows through the same Ingest path. It stabilizes faster because the source has already finished aggregating.
  • WindowTTL forced flush. If a window exceeds its TTL without stabilizing, the engine pushes the best known values so nothing is silently dropped.

What's NOT included

  • Fetcher implementation (CF GraphQL adapter)
  • Sink implementation (VM push client)
  • Integration with main.go / mode flag

@davidbirdsong

davidbirdsong commented Jun 4, 2026

Copy link
Copy Markdown
Author

This change is part of the following stack:

Change managed by git-spice.

@davidbirdsong davidbirdsong force-pushed the davidbirdsong/net-146-switch-cloudflare-exporter-from-pull-based-scraping-to-push branch 2 times, most recently from bda4e5c to 1b94432 Compare June 4, 2026 23:23
@davidbirdsong davidbirdsong force-pushed the davidbirdsong/net-146-switch-cloudflare-exporter-from-pull-based-scraping-to-push branch from 1b94432 to d1a111e Compare June 5, 2026 00:05
@jose-ledesma jose-ledesma requested a review from Copilot June 5, 2026 13:01

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new converge/ Go package that implements a push-oriented metric stabilization engine (with a thin runner loop) intended to support migrating time-bucketed Cloudflare analytics metrics from scrape-based Prometheus collection to timestamp-correct push-based ingestion.

Changes:

  • Introduces a pure(ish) state-machine-style Engine that ingests Observations and emits stabilized Samples, with TTL-based window expiry and flush semantics.
  • Adds a Run() convenience loop to drive the engine with a Fetcher and Sink, supporting live polling plus rate-limited backfill.
  • Updates module dependencies (go.mod / go.sum) and ignore rules for .gograph/.

Reviewed changes

Copilot reviewed 10 out of 12 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
converge/engine.go New core engine + config/defaults + ingest/expire/flush/stats.
converge/window.go Per-bucket window state and forced-flush sampling.
converge/tracker.go Per-series stabilization tracker (consecutive-identical threshold).
converge/runner.go Loop wiring FetcherEngineSink with live/backfill lanes.
converge/types.go Public types and interfaces (Observation/Sample/Fetcher/Sink/Stats).
converge/engine_test.go Unit tests for engine behavior (stabilization, TTL expiry, flush).
converge/tracker_test.go Unit tests for tracker state transitions and edge cases.
go.mod Adds new indirect requirements (notably github.com/prometheus/prometheus).
go.sum Corresponding checksum additions for new/updated transitive modules.
.gitignore Ignores .gograph/ at repo root.
converge/.gitignore Ignores .gograph/ under converge/.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread converge/engine.go
Comment on lines +69 to +74
for _, o := range obs {
w := e.windows[o.Bucket]
if w == nil {
w = newWindow(o.Bucket, time.Now())
e.windows[o.Bucket] = w
}

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix this up in the next PR

Comment thread converge/runner.go
Comment thread go.mod
Comment on lines 30 to +32
github.com/prometheus/common v0.53.0 // indirect
github.com/prometheus/procfs v0.14.0 // indirect
github.com/prometheus/prometheus v0.50.0 // indirect
Comment thread converge/engine.go Outdated
davidbirdsong and others added 2 commits June 5, 2026 10:14
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
@davidbirdsong davidbirdsong merged commit e0053fa into develop Jun 5, 2026
5 checks passed
@davidbirdsong davidbirdsong deleted the davidbirdsong/net-146-switch-cloudflare-exporter-from-pull-based-scraping-to-push branch June 5, 2026 20:42
Comment thread converge/README.md
Comment on lines +50 to +55
// Feed observations. Returns samples for any tracker that just stabilized.
samples := eng.Ingest(observations)

// Check TTLs. Returns forced-push samples from expiring windows.
// Removes closed windows.
samples = eng.Expire(time.Now())

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are these two different calls? any downsides from doing both in a single call so we only ever need to process a single set of samples?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The separation make sense semantically to me.

  • expire check runs even when fetch fails, otherwise we'd have to pass in a nil set of samples or something
  • testability is easier since expire can be tested without having to construct observations so we can make targeted assertions with less setup

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants