Skip to content

Expand Prometheus metrics with request latency histograms and error counters #38

Description

@mikewheeleer

Implement richer Prometheus metrics with latency and error counters

Description

GET /api/v1/metrics in src/index.ts exposes four gauges (services, api keys, outstanding usage, paused), hand-built as text lines. There are no request counters, latency histograms, or error counters, even though the request-timer middleware already measures durationMs. This issue expands the metrics so operators can actually observe traffic and latency.

Requirements and context

  • Repository scope: Agentpay-Org/Agentpay-backend only.
  • Add agentpay_http_requests_total{method,route,status} counters and an agentpay_http_request_duration_seconds histogram fed by the existing timer middleware.
  • Add an agentpay_http_errors_total counter incremented in the final error handler.
  • Keep the existing gauges and the text/plain; version=0.0.4 exposition format; consider using prom-client for correctness of histogram buckets.
  • Ensure route labels use the matched route pattern (e.g. /api/v1/usage/:agent/:serviceId), not the raw path, to bound cardinality.

Suggested execution

  • Fork the repo and create a branch
  • git checkout -b feature/observability-08-prometheus-histograms
  • Implement changes
    • Write code in: the metrics endpoint and timer/error middleware in src/index.ts, optionally a src/metrics.ts.
    • Write comprehensive tests in: new src/metrics.test.ts — counter increments, histogram presence, format validity.
    • Add documentation: document the metric names in docs/metrics.md.
    • Add TSDoc on any metrics helpers.
    • Validate security assumptions: no high-cardinality labels (no raw agent ids in labels).
  • Test and commit

Test and commit

  • Run npm run build, npm test, and npm run lint.
  • Cover edge cases: error path increments error counter, route normalization, exposition format parses.
  • Include the full npm test output in the PR description.

Example commit message

feat: expand prometheus metrics with latency histogram and error counters

Guidelines

  • Minimum 95 percent test coverage for impacted modules.
  • Clear, reviewer-focused documentation.
  • Timeframe: 96 hours.

Community & contribution rewards

  • 💬 Join the AgentPay community on Discord for questions, reviews, and faster merges: https://discord.gg/eXvRKkgcv
  • ⭐ This is a GrantFox OSS / Official Campaign task and may be rewarded. When your PR is merged you'll be prompted to rate the project — if this issue and the maintainers helped you ship, we'd be grateful for a 5-star rating. Clear questions in Discord and tidy, well-tested PRs are the fastest path to a merge and a reward.

Metadata

Metadata

Assignees

No one assigned

    Fields

    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions