Skip to content

feat: add per-module lifetime counters for connections and bytes#36

Open
yaoge123 wants to merge 2 commits into
ustclug:masterfrom
yaoge123:upstream-pr/per-module-counters
Open

feat: add per-module lifetime counters for connections and bytes#36
yaoge123 wants to merge 2 commits into
ustclug:masterfrom
yaoge123:upstream-pr/per-module-counters

Conversation

@yaoge123
Copy link
Copy Markdown
Contributor

@yaoge123 yaoge123 commented Jun 2, 2026

Background

Today the proxy already exposes lifetime counters
(rsync_proxy_completed_connections_total,
rsync_proxy_sent_bytes_total, rsync_proxy_received_bytes_total)
without any labels, plus per-connection gauges with module and
upstream labels.

That makes some dashboards awkward:

  • The unlabeled lifetime counters can't tell which module is generating
    traffic over time.
  • The per-connection gauges (rsync_proxy_connection_sent_bytes etc.)
    cannot be used to compute historical traffic per module — the series
    disappear when the connection ends, so rate() / increase() give
    bogus results.

Operators end up wanting "Top modules by traffic over the last 24h" but
the metrics surface doesn't really support that.

Change

Introduce three new Prometheus counters incremented at relay
completion, labeled by module and upstream:

  • rsync_proxy_module_completed_connections_total{module,upstream}
  • rsync_proxy_module_sent_bytes_total{module,upstream}
  • rsync_proxy_module_received_bytes_total{module,upstream}

The counters are increased at the same point as the existing global
totals, so they only count successful relays — the listing-all-modules,
unknown-module, queue-full and dial-error paths are intentionally
excluded, matching the existing
rsync_proxy_completed_connections_total semantics.

Accepted is not split per module because the module name isn't
known yet at accept time.

Implementation

  • Storage: sync.Map keyed by (module, upstream). Entries are created
    lazily on the first finished relay for that pair.
  • Output is sorted by (module, upstream) so scrapes are stable.
  • target.Upstream is the upstream's config Name.

Tests

  • go vet ./... && go build ./... && go test -race ./pkg/... — green.
  • TestMetricsIncludesLifetimeCounters extended to assert the new
    counters with their labels.

Operational note

These are pure additions — no existing series are renamed or removed.

Introduce three new Prometheus counters tracked at relay completion,
labeled by module and upstream:

  - rsync_proxy_module_completed_connections_total{module,upstream}
  - rsync_proxy_module_sent_bytes_total{module,upstream}
  - rsync_proxy_module_received_bytes_total{module,upstream}

Until now, lifetime totals (completed connections, sent/received bytes)
were only exposed without labels. The per-connection gauges
(rsync_proxy_connection_sent_bytes etc.) cannot be used to compute
historical traffic per module because the series disappear when the
connection ends.

The counters are increased at the same point as the existing global
totals, so they only count successful relays (listing-all-modules,
unknown-module, queue-full and dial-error paths are intentionally
excluded, matching the existing rsync_proxy_completed_connections_total
semantics). 'accepted' is not split per module because the module name
is not yet known at accept time.

Storage uses sync.Map keyed by (module, upstream); entries are
created lazily on first finished relay. Output is sorted by
(module, upstream) to keep scrapes stable.

The existing TestMetricsIncludesLifetimeCounters has been extended to
assert the new counters.
Copilot AI review requested due to automatic review settings June 2, 2026 00:55
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR adds new Prometheus lifetime counters scoped to each (module, upstream) pair, and updates the server to track and expose those counters.

Changes:

  • Track per-(module, upstream) lifetime counters (completed connections, sent bytes, received bytes) in Server.
  • Emit the new counters in Prometheus text format with stable ordering.
  • Extend metrics tests to validate the new exported series.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
pkg/server/server.go Adds storage and updates for per-(module, upstream) lifetime counters during relay completion.
pkg/server/metrics.go Exposes the new per-(module, upstream) counters in Prometheus output with deterministic ordering.
pkg/server/server_test.go Adds assertions ensuring the new metrics are present in the scrape output.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/server/server.go
Comment on lines +371 to +378
func (s *Server) getModuleCounters(module, upstream string) *moduleCounters {
key := moduleUpstreamKey{module: module, upstream: upstream}
if v, ok := s.moduleCounters.Load(key); ok {
return v.(*moduleCounters)
}
v, _ := s.moduleCounters.LoadOrStore(key, &moduleCounters{})
return v.(*moduleCounters)
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, fixed in 3d3b8a7. getModuleCounters now applies prometheusLabelValueOrUnknown at key creation, so the internal sync.Map key matches the rendered Prometheus label set; metrics.go now only escapes the (already normalized) key. Added TestModuleCountersNormalizeEmptyKeyToUnknown as a regression test.

Comment thread pkg/server/server.go
Comment on lines +184 to +187
// Per-(module, upstream) counters tracked when a relay finishes
// successfully. Lazy-initialized via getModuleCounters.
// map key is moduleUpstreamKey. Value is *moduleCounters.
moduleCounters sync.Map
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practice this is bounded. target.Upstream is set by the proxy itself to upstreamConfig.Name (always a configured value), and the module value at the increment site only reaches the lifetime accumulator after the request has resolved against s.modules — the listing-all-modules / unknown-module / queue-full / dial-error paths return earlier without touching getModuleCounters. So the upper bound is Σ(len(upstream.Modules)), identical in cardinality to the existing per-connection gauges. Happy to add an explicit doc comment about that bound if helpful.

Comment thread pkg/server/server.go
Comment on lines +371 to +378
func (s *Server) getModuleCounters(module, upstream string) *moduleCounters {
key := moduleUpstreamKey{module: module, upstream: upstream}
if v, ok := s.moduleCounters.Load(key); ok {
return v.(*moduleCounters)
}
v, _ := s.moduleCounters.LoadOrStore(key, &moduleCounters{})
return v.(*moduleCounters)
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sticking with the load-then-LoadOrStore pattern: it matches the same shape used in getUpstreamCounters immediately above, and the Load fast-path avoids the LoadOrStore allocation/CAS on the steady-state hot path (every relay completion). Concurrency semantics are identical either way.

Comment thread pkg/server/server_test.go
Comment on lines +746 to +755
// Per-(module, upstream) lifetime counters.
assert.Contains(t, text, "# HELP rsync_proxy_module_completed_connections_total")
assert.Contains(t, text, "# TYPE rsync_proxy_module_completed_connections_total counter")
assert.Contains(t, text, "rsync_proxy_module_completed_connections_total{module=\"fake\",upstream=\"u1\"} 1\n")
assert.Contains(t, text, "# HELP rsync_proxy_module_sent_bytes_total")
assert.Contains(t, text, "# TYPE rsync_proxy_module_sent_bytes_total counter")
assert.Contains(t, text, fmt.Sprintf("rsync_proxy_module_sent_bytes_total{module=\"fake\",upstream=\"u1\"} %d\n", len(payload)))
assert.Contains(t, text, "# HELP rsync_proxy_module_received_bytes_total")
assert.Contains(t, text, "# TYPE rsync_proxy_module_received_bytes_total counter")
assert.Contains(t, text, "rsync_proxy_module_received_bytes_total{module=\"fake\",upstream=\"u1\"} 0\n")
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty-vs-unknown collision is now covered by TestModuleCountersNormalizeEmptyKeyToUnknown (3d3b8a7). Escape-character coverage already exists at the helper level in TestPrometheusEscapeLabelValue (quotes, backslashes, newlines), and exercising the full render path with an exotic module name would mostly re-test that helper — leaving as-is to keep the test surface focused.

The module/upstream pair stored in moduleCounters was using the raw
string, while metrics.go applied prometheusLabelValueOrUnknown at scrape
time. An empty string and the literal "unknown" therefore became two
distinct map entries that rendered to the same Prometheus label set,
producing duplicate output lines (rejected by Prometheus as ambiguous).

Normalize at key creation in getModuleCounters so the internal key
matches what is rendered. metrics.go now only escapes the (already
normalized) key.

Adds a regression test that checks getModuleCounters("","") and
getModuleCounters("unknown","unknown") return the same counter, and that
the rendered output emits exactly one line for that label set.
@taoky taoky requested a review from iBug June 2, 2026 19:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants