Skip to content

fix(audit): populate missing actor and request metadata in audit logs#159

Merged
appleboy merged 12 commits intomainfrom
worktree-audit
Apr 9, 2026
Merged

fix(audit): populate missing actor and request metadata in audit logs#159
appleboy merged 12 commits intomainfrom
worktree-audit

Conversation

@appleboy
Copy link
Copy Markdown
Member

@appleboy appleboy commented Apr 8, 2026

Fix audit logging completeness, observability, and shutdown safety

Background

Investigation revealed that audit records written to the database were missing several fields that should have been populated. Further review uncovered four design flaws in the audit subsystem affecting completeness, reliability, and observability.

Issues Fixed

1. Missing ActorUsername (11 audit entries)

Problem: The /oauth/token endpoint is sessionless, so the request context carries no user. Token-related service methods set only ActorUserID from the token/record, leaving ActorUsername empty even though the user identity was knowable.

Fix: Centralized all ActorUsername enrichment in buildAuditLog (single source of truth):

  1. Try context first (zero cost) — only when entry.ActorUserID matches the context user ID (prevents misattribution)
  2. Fall back to a DB lookup when context did not provide a username
  3. Skip synthetic machine identities (client:<clientID> from client_credentials grant) by prefix check, avoiding wasted DB queries on hot introspection paths
  4. Noop service automatically opts outNoopAuditService.Log() is a no-op, so buildAuditLog is never called, so no DB query happens

ExchangeAuthorizationCode caches the user object it fetches for OIDC profile/email claims into request context (models.SetUserContext), so the audit layer's context-match check picks it up without a duplicate query. Result: still only one GetUserByID per auth-code exchange for openid+profile/email.

This eliminates the previous service-layer resolveUsername helper and 11 explicit ActorUsername: settings across token_exchange.go, token_refresh.go, and token_management.go. ActorUsername is now treated symmetrically with ActorIP, UserAgent, RequestPath, RequestMethod, and ActorUserID — all auto-enriched by buildAuditLog.

2. Missing request metadata (UserAgent, RequestPath, RequestMethod)

Problem: 41 of 44 audit log call sites left these fields empty. buildAuditLog had context fallback for ActorIP/ActorUsername/ActorUserID but not for request metadata.

Fix:

  • New util.SetRequestMetadataContext and three getter helpers in internal/util/context.go
  • New middleware.RequestContextMiddleware (replaces IPMiddleware) propagates IP + UA + path + method into request context
  • buildAuditLog enriches all three request metadata fields from context
  • Field truncation (UserAgent, RequestPath → 497 chars + ...; RequestMethod → hard 10 chars) to fit DB column limits

3. Audit event loss on shutdown

Problem: The worker goroutine was signaled by closing a separate shutdownCh, leaving any unconsumed events in logChan permanently lost.

Fix:

  • Replaced shutdownCh with closing logChan itself, so the worker's range-based receive guarantees draining all accepted events
  • Added an atomic.Bool stopped flag and an RWMutex sendMu:
    • Log() takes sendMu.RLock() and checks stopped before sending
    • Shutdown() sets stopped, then takes sendMu.Lock() to wait for in-flight sends, then closes logChan
  • Eliminates both the lost-events bug and the send-on-closed-channel race

4. Silent dropping with no observability

Problem: When the buffer was full, events were dropped via log.Printf only — no metric for monitoring/alerting.

Fix:

  • Added audit_events_dropped_total Prometheus counter
  • Counter registration is gated behind an explicit registerer set via SetAuditMetricsRegisterer, so deployments with metrics disabled do not leak collectors
  • Mutex-protected late initialization: counter can be created before metrics are configured (e.g. in tests) and registered later when a registerer becomes available
  • Aligned naming convention with internal/metrics (single Name field, no Namespace/Subsystem)

Architecture Changes

Area Before After
Request context IP only via IPMiddleware IP + UA + Path + Method via RequestContextMiddleware
ActorUsername enrichment Service layer + audit layer (split logic, leaky abstraction) Single source of truth in buildAuditLog (context match → DB fallback → skip machine identities)
Audit shutdown Worker exits early; events lost Drain via closed channel + send/close coordination
Buffer-full drops log.Printf only Prometheus counter audit_events_dropped_total
Service ↔ audit coupling Token services type-asserted against *NoopAuditService None — services just set ActorUserID and let audit handle the rest

Files Changed

File Change
internal/services/audit.go Worker shutdown coordination, request metadata enrichment, gated counter, ActorUsername enrichment (context match + DB fallback + machine identity skip), field truncation
internal/services/audit_test.go New tests for context enrichment, ActorUsername preservation, DB fallback, machine identity skip, deterministic shutdown drain
internal/services/token_exchange.go Lazy ID-token user fetch; cache user in context for audit reuse; removed explicit ActorUsername settings
internal/services/token_management.go Removed explicit ActorUsername settings
internal/services/token_refresh.go Removed explicit ActorUsername settings
internal/util/context.go Request metadata context helpers (UserAgent, Path, Method)
internal/util/context_test.go Tests for the new context helpers
internal/middleware/context.go RequestContextMiddleware replaces IPMiddleware
internal/bootstrap/bootstrap.go Wire SetAuditMetricsRegisterer when metrics enabled
internal/bootstrap/router.go Use RequestContextMiddleware
internal/middleware/auth.go Update stale comment reference
internal/handlers/auth.go Update stale comment reference
internal/handlers/oauth_handler.go Update stale comment reference

Total: 13 files changed, 440 insertions, 54 deletions

Test Plan

  • make generate passes
  • make lint — 0 issues
  • make test — all 14 packages pass, zero failures
  • New unit tests for context metadata helpers (3 tests)
  • New unit tests for buildAuditLog enrichment behavior:
    • Request metadata enrichment from context
    • Explicit values not overridden
    • User context enrichment when ActorUserID is empty
    • DB fallback when context has no user
    • Machine identity (client:<id>) skips DB lookup
  • Deterministic test for shutdown drain (verifies persisted DB rows)
  • Copilot code review iterated to clean state (0 outstanding comments)

Compatibility Notes

  • IPMiddleware removed: callers must use RequestContextMiddleware. Internal package, no external consumers.
  • audit_events_dropped_total metric: New counter; only registered when metrics are enabled (MetricsEnabled=true).
  • Audit row format: UserAgent, RequestPath, RequestMethod columns will now be populated for previously empty rows. ActorUsername is now reliably populated for all token operations. No schema changes.

🤖 Generated with Claude Code

- Resolve ActorUsername via DB lookup for 11 audit entries in token
  exchange, refresh, and management paths where sessionless API context
  left the field empty
- Add RequestContextMiddleware to propagate UserAgent, RequestPath, and
  RequestMethod through request context for all downstream audit logging
- Enrich buildAuditLog with context fallback for request metadata fields
- Drain logChan on shutdown to prevent silent audit event loss
- Add Prometheus counter for audit events dropped due to buffer overflow
- Consolidate duplicate GetUserByID call in ExchangeAuthorizationCode
- Remove dead IPMiddleware wrapper and update stale references

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 8, 2026 14:45
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 8, 2026

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves audit log completeness and observability by enriching audit entries with missing actor/request metadata, tightening shutdown behavior to avoid audit data loss, and adding dropped-event monitoring.

Changes:

  • Add request metadata (User-Agent, path, method) propagation via context middleware + context helpers, and enrich audit logs from context.
  • Populate missing ActorUsername for token exchange/refresh/management audit events via store lookups, and reduce duplicate user fetch in auth-code exchange.
  • Drain the audit log channel during shutdown and add a Prometheus counter for dropped audit events.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
internal/util/context.go Adds context helpers to store/extract HTTP request metadata.
internal/util/context_test.go Unit tests for request metadata context helpers and chaining.
internal/middleware/context.go Replaces IP-only middleware with request context middleware (IP + UA/path/method).
internal/bootstrap/router.go Wires the new request context middleware into the router.
internal/services/audit.go Enriches audit logs from context, drains channel on shutdown, adds dropped-events counter.
internal/services/audit_test.go Adds tests for enrichment and shutdown draining behavior.
internal/services/token.go Adds helper to resolve usernames for audit entries.
internal/services/token_refresh.go Populates ActorUsername in refresh-path audit logs.
internal/services/token_management.go Populates ActorUsername in token management audit logs.
internal/services/token_exchange.go Populates ActorUsername in exchange-path audit logs; consolidates user lookup for ID token claims + audit.
internal/middleware/auth.go Updates stale comment reference to renamed middleware.
internal/handlers/oauth_handler.go Updates stale comment reference to renamed middleware.
internal/handlers/auth.go Updates stale comment reference to renamed middleware.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…metric registration

- Make resolveUsername context-aware: check request context before
  falling back to DB lookup, avoiding unnecessary queries on
  authenticated routes
- Lazily fetch user profile in ExchangeAuthorizationCode only when
  openid profile/email claims are needed
- Register audit events_dropped_total counter via sync.Once singleton
  to prevent duplicate-registration panics in tests
- Make TestShutdown_DrainsLogChan deterministic by constructing the
  service without auto-starting the worker and verifying persisted rows

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Reuse the user object fetched for ID-token profile/email claims to
populate actorUsername, so the sessionless /oauth/token path never
issues two GetUserByID calls for the same user.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- resolveUsername now checks that the context user ID matches the
  requested userID before using the context username, preventing
  misattribution when IDs differ
- Shutdown drain uses a length snapshot instead of an unbounded loop,
  so late-arriving producers cannot prevent the worker from exiting

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Truncate UserAgent, RequestPath, RequestMethod to match model varchar
  limits before persisting, preventing write failures from long values
- Gate audit dropped-events Prometheus counter behind an explicit
  registerer so deployments with metrics disabled do not register
  collectors from the services layer
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

… enqueue

- Subtract 3 from varchar limits passed to TruncateString so the
  appended "..." does not exceed column size constraints
- Add atomic stopped flag to reject events in Log() after Shutdown()
  begins, preventing events from being stuck in logChan
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…down, and fix method truncation

- Resolve actorUsername via DB fallback before the ID token audit entry
  so openid-only requests (without profile/email) are not left empty
- Replace len()-snapshot drain with non-blocking receive loop in
  shutdown worker to flush entries enqueued during the race window
- Hard-truncate RequestMethod to varchar(10) limit without ellipsis
  so standard HTTP methods like PROPFIND (8 chars) are preserved
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…down race

- Replace shutdownCh with closing logChan to signal the worker, so
  range-based drain is guaranteed to see all accepted events
- Guard Log() sends with sendMu.RLock; Shutdown() takes sendMu.Lock
  after setting stopped, ensuring all in-flight sends complete before
  the channel is closed
- Worker uses ok-check on receive to detect closed channel and flush
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Rename TestGetRequestMetadataFromContext_Empty to
TestRequestMetadataContext_Empty since it tests individual getters,
not a single GetRequestMetadataFromContext helper.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

… noop lookups

- buildAuditLog only fills ActorUsername from context when entry ActorUserID
  is empty or matches the context user, preventing misattribution when call
  sites set only ActorUserID
- Align audit_events_dropped_total naming with the rest of the metrics
  package (single Name field, no Namespace/Subsystem)
- resolveUsername short-circuits when audit logging is the noop service,
  avoiding wasted DB round-trips on hot token paths

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Replace the sync.Once-based counter init with a mutex-protected pattern
that allows SetAuditMetricsRegisterer to register a previously created
counter. This eliminates the silent misconfiguration when AuditService
is constructed before metrics are configured (common in tests).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Remove TokenService.resolveUsername and the explicit ActorUsername
settings from 11 audit entries across token exchange, refresh, and
management. The audit service is now the single source of truth for
ActorUsername resolution:

- buildAuditLog tries the request context (zero cost) when the entry's
  ActorUserID matches the context user, then falls back to a DB lookup
  when context did not help
- Synthetic machine identities ("client:<clientID>" from the
  client_credentials grant) are detected by prefix and skip the DB
  lookup, avoiding wasted queries on hot introspection paths
- ExchangeAuthorizationCode caches the user it fetches for OIDC
  profile/email claims into context (via SetUserContext) so the audit
  layer's context-match check picks it up without a duplicate query

Removes the leaky type assertion against *NoopAuditService and the
asymmetry where ActorUsername needed service-layer help while every
other auto-enriched field (IP, UA, path, method, ActorUserID) was
handled in buildAuditLog.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@appleboy appleboy merged commit 86b2a46 into main Apr 9, 2026
16 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants