Skip to content

feat: Observability — OutboxProcessorListener + okapi-micrometer module (KOJAK-44)#27

Open
endrju19 wants to merge 11 commits intomainfrom
observability
Open

feat: Observability — OutboxProcessorListener + okapi-micrometer module (KOJAK-44)#27
endrju19 wants to merge 11 commits intomainfrom
observability

Conversation

@endrju19
Copy link
Copy Markdown
Collaborator

Summary

  • Sealed event hierarchy (OutboxProcessingEvent: Delivered, RetryScheduled, Failed) in okapi-core — enables exhaustive when in Kotlin, compiler warns on missing handlers
  • OutboxProcessorListener interface with default no-op methods — callbacks for per-entry and per-batch processing events
  • OutboxProcessor accepts optional listener + clock — notifies with try-catch isolation (listener exceptions never break processing)
  • New okapi-micrometer moduleMicrometerOutboxListener (counters + timer) and MicrometerOutboxMetrics (gauges polling OutboxStore with TransactionRunner + NaN on failure)
  • OkapiMicrometerAutoConfiguration — top-level Spring Boot autoconfiguration, auto-detects MeterRegistry, wires read-only TransactionRunner for gauge queries
  • README updated with Observability section and module table

Metrics

Metric Type Source
okapi.entries.delivered Counter Listener event
okapi.entries.retry_scheduled Counter Listener event
okapi.entries.failed Counter Listener event
okapi.batch.duration Timer Listener event
okapi.entries.count Gauge (tag: status) DB poll
okapi.entries.lag.seconds Gauge (tag: status) DB poll

Design decisions

  • Sealed events over separate callbacks — new event types cause compiler warnings, not silent misses
  • java.time.Duration over Long millis — type-safe, Micrometer Timer.record(Duration) native
  • Per-entry duration excludes DB write — measures delivery time only, not store.updateAfterProcessing()
  • Top-level OkapiMicrometerAutoConfiguration — inner @Configuration classes don't reliably see @ConditionalOnBean from other autoconfigs in Spring Boot 4
  • RetryScheduled name (not Retried) — semantically correct even on first attempt ("attempt failed, scheduled for retry")

Test plan

  • OutboxProcessorTest — listener events (Delivered, RetryScheduled, Failed), exception isolation, null listener, retry exhaustion → Failed
  • MicrometerOutboxListenerTest — counters per event type, batch timer
  • MicrometerOutboxMetricsTest — gauges per status, lag calculation, TransactionRunner wrapping, store exception → NaN
  • OutboxProcessorAutoConfigurationTest — listener autowired when MeterRegistry present
  • ObservabilityEndToEndTest — full pipeline on live Postgres + WireMock (Testcontainers)
  • Verified on standalone demo app with Spring Boot Actuator + Prometheus endpoint

endrju19 added 11 commits April 15, 2026 15:34
Add Micrometer to version catalog, register okapi-micrometer module in
settings and BOM. Module depends on okapi-core with micrometer-core as
compileOnly.
OutboxProcessor accepts an optional listener and clock. After each entry
is processed, it emits a sealed OutboxProcessingEvent (Delivered, Retried,
Failed) with per-entry Duration. After the batch, it calls onBatchProcessed.
Exceptions in the listener are caught and logged — they never break processing.
Implements OutboxProcessorListener with Micrometer counters for
delivered/retried/failed entries and a timer for batch duration.
Registers count-per-status and lag-per-status gauges that poll
OutboxStore on each Prometheus scrape. Gauge suppliers are wrapped
in an optional TransactionRunner (required for Exposed-backed stores)
with try-catch returning NaN on failure.
…K-44)

Add MicrometerConfiguration inner class that creates MicrometerOutboxListener
and MicrometerOutboxMetrics beans when MeterRegistry is on the classpath.
OutboxProcessor bean now accepts an optional OutboxProcessorListener.
"Retried" (past tense) implied the retry already happened, but the
event is emitted when a failed delivery attempt is rescheduled for
another try — even on the very first attempt. "RetryScheduled" is
semantically accurate regardless of the attempt number.

Renamed across: sealed event, OutboxProcessor mapping, MicrometerOutboxListener
counter (okapi.entries.retried → okapi.entries.retry_scheduled), and all tests.
…y (KOJAK-44)

outboxProcessor bean now injects ObjectProvider<Clock>, consistent with
all other beans in OutboxAutoConfiguration. Previously it silently fell
back to Clock.systemUTC() even when a custom Clock bean was present.

Per-entry duration now captures only the delivery attempt time
(entryProcessor.process), excluding store.updateAfterProcessing().
This prevents DB write latency from inflating delivery metrics.
…eMock (KOJAK-44)

Verifies the full observability pipeline against real infrastructure:
- Retry-then-succeed: RetryScheduled counter + Delivered counter + gauges
- Permanent failure: Failed counter + gauge reflects FAILED status
- Batch duration: timer records realistic HTTP delivery time (50ms stub)
- Lag gauge: reflects real time difference for pending entries in Postgres
…el (KOJAK-44)

Inner @configuration classes inside @autoConfiguration do not reliably
see beans from other autoconfigurations via @ConditionalOnBean. This
caused MicrometerConfiguration to never activate because MeterRegistry
was not yet available when the condition was evaluated.

Fix: extract to a separate top-level @autoConfiguration with its own
@AutoConfigureAfter targeting the correct Spring Boot 4 package
(org.springframework.boot.micrometer.metrics.autoconfigure).
…n (KOJAK-44)

Add Observability section with metrics table and quick-start snippet.
Update module diagram and table to include okapi-micrometer.
Rename okapi.entries.retry_scheduled to okapi.entries.retry.scheduled
(dots-only follows Micrometer naming convention). Clarify README
observability section, add tag names to gauge descriptions, document
duration excludes DB write, single-listener note, autoconfig override.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant