feat: add per-module lifetime counters for connections and bytes#36
feat: add per-module lifetime counters for connections and bytes#36yaoge123 wants to merge 2 commits into
Conversation
Introduce three new Prometheus counters tracked at relay completion,
labeled by module and upstream:
- rsync_proxy_module_completed_connections_total{module,upstream}
- rsync_proxy_module_sent_bytes_total{module,upstream}
- rsync_proxy_module_received_bytes_total{module,upstream}
Until now, lifetime totals (completed connections, sent/received bytes)
were only exposed without labels. The per-connection gauges
(rsync_proxy_connection_sent_bytes etc.) cannot be used to compute
historical traffic per module because the series disappear when the
connection ends.
The counters are increased at the same point as the existing global
totals, so they only count successful relays (listing-all-modules,
unknown-module, queue-full and dial-error paths are intentionally
excluded, matching the existing rsync_proxy_completed_connections_total
semantics). 'accepted' is not split per module because the module name
is not yet known at accept time.
Storage uses sync.Map keyed by (module, upstream); entries are
created lazily on first finished relay. Output is sorted by
(module, upstream) to keep scrapes stable.
The existing TestMetricsIncludesLifetimeCounters has been extended to
assert the new counters.
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR adds new Prometheus lifetime counters scoped to each (module, upstream) pair, and updates the server to track and expose those counters.
Changes:
- Track per-(module, upstream) lifetime counters (completed connections, sent bytes, received bytes) in
Server. - Emit the new counters in Prometheus text format with stable ordering.
- Extend metrics tests to validate the new exported series.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| pkg/server/server.go | Adds storage and updates for per-(module, upstream) lifetime counters during relay completion. |
| pkg/server/metrics.go | Exposes the new per-(module, upstream) counters in Prometheus output with deterministic ordering. |
| pkg/server/server_test.go | Adds assertions ensuring the new metrics are present in the scrape output. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| func (s *Server) getModuleCounters(module, upstream string) *moduleCounters { | ||
| key := moduleUpstreamKey{module: module, upstream: upstream} | ||
| if v, ok := s.moduleCounters.Load(key); ok { | ||
| return v.(*moduleCounters) | ||
| } | ||
| v, _ := s.moduleCounters.LoadOrStore(key, &moduleCounters{}) | ||
| return v.(*moduleCounters) | ||
| } |
There was a problem hiding this comment.
Good catch, fixed in 3d3b8a7. getModuleCounters now applies prometheusLabelValueOrUnknown at key creation, so the internal sync.Map key matches the rendered Prometheus label set; metrics.go now only escapes the (already normalized) key. Added TestModuleCountersNormalizeEmptyKeyToUnknown as a regression test.
| // Per-(module, upstream) counters tracked when a relay finishes | ||
| // successfully. Lazy-initialized via getModuleCounters. | ||
| // map key is moduleUpstreamKey. Value is *moduleCounters. | ||
| moduleCounters sync.Map |
There was a problem hiding this comment.
In practice this is bounded. target.Upstream is set by the proxy itself to upstreamConfig.Name (always a configured value), and the module value at the increment site only reaches the lifetime accumulator after the request has resolved against s.modules — the listing-all-modules / unknown-module / queue-full / dial-error paths return earlier without touching getModuleCounters. So the upper bound is Σ(len(upstream.Modules)), identical in cardinality to the existing per-connection gauges. Happy to add an explicit doc comment about that bound if helpful.
| func (s *Server) getModuleCounters(module, upstream string) *moduleCounters { | ||
| key := moduleUpstreamKey{module: module, upstream: upstream} | ||
| if v, ok := s.moduleCounters.Load(key); ok { | ||
| return v.(*moduleCounters) | ||
| } | ||
| v, _ := s.moduleCounters.LoadOrStore(key, &moduleCounters{}) | ||
| return v.(*moduleCounters) | ||
| } |
There was a problem hiding this comment.
Sticking with the load-then-LoadOrStore pattern: it matches the same shape used in getUpstreamCounters immediately above, and the Load fast-path avoids the LoadOrStore allocation/CAS on the steady-state hot path (every relay completion). Concurrency semantics are identical either way.
| // Per-(module, upstream) lifetime counters. | ||
| assert.Contains(t, text, "# HELP rsync_proxy_module_completed_connections_total") | ||
| assert.Contains(t, text, "# TYPE rsync_proxy_module_completed_connections_total counter") | ||
| assert.Contains(t, text, "rsync_proxy_module_completed_connections_total{module=\"fake\",upstream=\"u1\"} 1\n") | ||
| assert.Contains(t, text, "# HELP rsync_proxy_module_sent_bytes_total") | ||
| assert.Contains(t, text, "# TYPE rsync_proxy_module_sent_bytes_total counter") | ||
| assert.Contains(t, text, fmt.Sprintf("rsync_proxy_module_sent_bytes_total{module=\"fake\",upstream=\"u1\"} %d\n", len(payload))) | ||
| assert.Contains(t, text, "# HELP rsync_proxy_module_received_bytes_total") | ||
| assert.Contains(t, text, "# TYPE rsync_proxy_module_received_bytes_total counter") | ||
| assert.Contains(t, text, "rsync_proxy_module_received_bytes_total{module=\"fake\",upstream=\"u1\"} 0\n") |
There was a problem hiding this comment.
Empty-vs-unknown collision is now covered by TestModuleCountersNormalizeEmptyKeyToUnknown (3d3b8a7). Escape-character coverage already exists at the helper level in TestPrometheusEscapeLabelValue (quotes, backslashes, newlines), and exercising the full render path with an exotic module name would mostly re-test that helper — leaving as-is to keep the test surface focused.
The module/upstream pair stored in moduleCounters was using the raw
string, while metrics.go applied prometheusLabelValueOrUnknown at scrape
time. An empty string and the literal "unknown" therefore became two
distinct map entries that rendered to the same Prometheus label set,
producing duplicate output lines (rejected by Prometheus as ambiguous).
Normalize at key creation in getModuleCounters so the internal key
matches what is rendered. metrics.go now only escapes the (already
normalized) key.
Adds a regression test that checks getModuleCounters("","") and
getModuleCounters("unknown","unknown") return the same counter, and that
the rendered output emits exactly one line for that label set.
Background
Today the proxy already exposes lifetime counters
(
rsync_proxy_completed_connections_total,rsync_proxy_sent_bytes_total,rsync_proxy_received_bytes_total)without any labels, plus per-connection gauges with
moduleandupstreamlabels.That makes some dashboards awkward:
traffic over time.
rsync_proxy_connection_sent_bytesetc.)cannot be used to compute historical traffic per module — the series
disappear when the connection ends, so
rate()/increase()givebogus results.
Operators end up wanting "Top modules by traffic over the last 24h" but
the metrics surface doesn't really support that.
Change
Introduce three new Prometheus counters incremented at relay
completion, labeled by
moduleandupstream:rsync_proxy_module_completed_connections_total{module,upstream}rsync_proxy_module_sent_bytes_total{module,upstream}rsync_proxy_module_received_bytes_total{module,upstream}The counters are increased at the same point as the existing global
totals, so they only count successful relays — the listing-all-modules,
unknown-module, queue-full and dial-error paths are intentionally
excluded, matching the existing
rsync_proxy_completed_connections_totalsemantics.Accepted is not split per module because the module name isn't
known yet at accept time.
Implementation
sync.Mapkeyed by(module, upstream). Entries are createdlazily on the first finished relay for that pair.
(module, upstream)so scrapes are stable.target.Upstreamis the upstream's config Name.Tests
go vet ./... && go build ./... && go test -race ./pkg/...— green.TestMetricsIncludesLifetimeCountersextended to assert the newcounters with their labels.
Operational note
These are pure additions — no existing series are renamed or removed.