Skip to content

fix(orchestrator): scope batch claim-error metric to process op#282

Closed
albertywu wants to merge 1 commit into
mainfrom
wua/batch-claim-error-metric-scope
Closed

fix(orchestrator): scope batch claim-error metric to process op#282
albertywu wants to merge 1 commit into
mainfrom
wua/batch-claim-error-metric-scope

Conversation

@albertywu

@albertywu albertywu commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

.

In the batch controller, every infrastructure-error counter is emitted
under the `process` operation subscope via metrics.NamedCounter (e.g.
deserialize_errors, storage_errors, counter_errors, batch_store_errors,
conflict_analyzer_errors, batch_dependent_store_errors, publish_errors),
landing at batch_controller.process.*_errors. The request-claim failure
path was the lone exception: it incremented c.metricsScope.Counter(
"request_claim_errors") directly, emitting at batch_controller.
request_claim_errors — one scope level above its siblings.

The deferred op.Complete(err) still bumps process.failed on that path,
so when an operator attributes a process.failed spike by summing
process.*_errors, the total is silently short by the request_claim_errors
count. Routing it through metrics.NamedCounter restores the per-category
breakdown so it reconciles with process.failed.

The ack-path outcome counters (skipped_halted, request_claim_lost_race)
intentionally stay at controller scope, matching the skipped_* convention
in the score and speculate controllers; only error counters belong under
the operation subscope.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 27, 2026 02:19
@albertywu albertywu requested review from a team as code owners June 27, 2026 02:19

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a metrics scoping inconsistency in the batch-phase orchestrator controller so that request_claim_errors is emitted under the same process operation subscope as the other per-category infrastructure error counters, keeping the error breakdown reconcilable with process.failed.

Changes:

  • Routes request_claim_errors through metrics.NamedCounter(c.metricsScope, opName, ...) instead of incrementing the controller-level counter directly.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@albertywu albertywu closed this Jun 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants