feat(cache): add Prometheus metrics for cache hit/miss/error observability#158
feat(cache): add Prometheus metrics for cache hit/miss/error observability#158
Conversation
…ility Implements P5 from issue #157. Adds generic InstrumentedCache[T] wrapper that records Prometheus counters for all cache operations. Enables operators to monitor cache effectiveness, tune TTLs, and diagnose issues. All five cache instances (token, client, user, metrics, client_count) automatically instrumented when METRICS_ENABLED=true. Zero breaking changes, negligible performance impact (~100ns per operation). Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds Prometheus-based observability for the project’s generic cache layer by introducing an instrumented cache decorator and wiring it into cache initialization so all configured caches can emit hit/miss/error counters when metrics are enabled.
Changes:
- Added a generic
InstrumentedCache[T]decorator that records cache hit/miss/error Prometheus counters. - Added singleton Prometheus counter registration for cache metrics in
internal/cache. - Updated cache bootstrap initialization to wrap all cache instances with instrumentation when
METRICS_ENABLED=true, including the tokenNoopCachepath.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| internal/cache/metrics.go | Registers cache_hits_total, cache_misses_total, cache_errors_total counters via a singleton helper. |
| internal/cache/instrumented.go | Implements InstrumentedCache[T] wrapper to emit metrics around cache operations. |
| internal/cache/instrumented_test.go | Adds unit tests validating hit/miss/error counter behavior on the wrapper. |
| internal/bootstrap/cache.go | Wires instrumentation into cache initialization for token/client/user/metrics/client_count caches. |
| internal/bootstrap/cache_test.go | Adds bootstrap-level tests for cache initialization paths with metrics enabled/disabled. |
| go.mod | Adjusts module requirements (moves swag to indirect; adds an additional indirect dependency). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Pre-resolve WithLabelValues counters at construction time to avoid hash-map lookup on every cache operation (hot-path optimization) - Rename CacheMetrics → Metrics to fix revive stutter lint - Remove pointless NoopCache instrumentation (100% miss rate provides no actionable signal and pollutes dashboards) - Consolidate duplicate MetricsEnabled wrapping logic - Clean up tests: use pre-resolved counters, add type assertions, remove assertion-free smoke tests - Use operation constants (opGet, opSet, etc.) instead of string literals Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Use sync/atomic.Bool for fetchCalled in GetWithFetch to prevent data race when singleflight goroutine sets the flag while the caller returns early on context cancellation - Always record a miss when fetchFunc was called, even if it returned an error, so hit-rate calculations remain accurate - Return instrumentedCache.Close from bootstrap to maintain the invariant that closer corresponds to the returned cache instance Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Context cancellation or singleflight errors should not inflate hit counters. Record hit/miss only when err==nil; record errFetch on non-ErrCacheMiss errors. Add test for context-canceled path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
A cache miss occurred whenever fetchFunc was invoked, even if the fetch itself failed. Record missCounter whenever fetchCalled is true so cache_misses_total accurately reflects cache lookup failures. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Summary
Completes the final item (P5) from issue #157: cache hit/miss/error Prometheus metrics for observability.
Closes #157
InstrumentedCache[T]decorator ininternal/cache/that records Prometheus countersWithLabelValuescounters at construction time to eliminate per-call map lookupsfetchCalledclosure wrapper inGetWithFetchto detect hit vs miss without doubleGet()token,client,user,metrics,client_count) whenMETRICS_ENABLED=trueopGet,opSet, etc.) to avoid stringly-typed labelsMetrics exposed
cache_hits_totalcache_namecache_misses_totalcache_namecache_errors_totalcache_name,operationDesign decisions
NoopCache— instrumenting it would only show 100% miss rate with no actionable signalWithLabelValueshash-map lookup on everyGet/Set/DeletecallfetchCalledstaysfalse(recorded as hit)Test plan
go test ./internal/cache/... -count=1— 29 tests pass (14 instrumented + existing)go test ./internal/bootstrap/... -count=1— 5 integration tests passgo build -o /dev/null .— project compilesmake lint— 0 issues🤖 Generated with Claude Code