Skip to content

fix(compensation): guard duplicate compensation scheduling with attempt, revision, and lifecycle checks#3925

Open
neuralmint wants to merge 1 commit into
orchestration-agent:mainfrom
neuralmint:fix/compensation-duplicate-guard
Open

fix(compensation): guard duplicate compensation scheduling with attempt, revision, and lifecycle checks#3925
neuralmint wants to merge 1 commit into
orchestration-agent:mainfrom
neuralmint:fix/compensation-duplicate-guard

Conversation

@neuralmint
Copy link
Copy Markdown

Summary

Implements a CompensationPlanner that enforces failure fan-out invariants before committing scheduling state during compensation.

Guards Added

  1. Lifecycle state machine — valid transitions: PENDING → COMPENSATING → COMPENSATED|FAILED (with FAILED → COMPENSATING retry allowed)
  2. Attempt monotonicity — rejects stale attempt values (race detection)
  3. Revision tracking — rejects stale revision values (stale-state detection)
  4. Duplicate compensation gate — once a task is COMPENSATED, further transitions are rejected
  5. Structured logging — every transition logs task_id, plan_id, from/to states, attempt, and revision
  6. Metrics — counters for each transition type recorded via the existing MetricsCollector

Integration

The CompensationPlanner is instantiated in the OrchestrationEngine and invoked in the error handler path. When a task fails, the engine attempts a guarded transition to COMPENSATING. If compensation was already scheduled (duplicate/stale), the planner raises an error and the engine logs a warning without masking the original exception.

Testing

  • 31 deterministic regression tests covering:
    • All valid lifecycle transitions
    • All invalid lifecycle transitions
    • Stale attempt detection (simulates concurrent failure fan-out)
    • Stale revision detection
    • Duplicate compensation gate
    • Plan serialization round-trip
    • Structured audit logging
    • Metrics recording
    • Engine integration test
  • All 62 existing tests continue to pass (incl. agent registry, config, deploy credentials, scheduler): ✓ 62 passed

Acceptance Criteria

  • Deterministic regression test covers the failure fan-out trigger
  • Compensation planner rejects or safely defers invalid transitions and preserves expected lifecycle state
  • Logs, metrics, and audit records explain the decision without exposing private runtime data

Closes #3924

…pt, revision, and lifecycle checks

Implements a CompensationPlanner that enforces failure fan-out invariants
before committing scheduling state. Guards include:

- Lifecycle state machine (PENDING → COMPENSATING → COMPENSATED|FAILED)
- Attempt monotonicity (stale attempt detection)
- Revision tracking (stale state detection)
- Duplicate compensation gate (once COMPENSATED, no further transitions)

The planner is integrated into OrchestrationEngine's error-handling path
so failure fan-out triggers go through guarded transitions.

Closes orchestration-agent#3924

Test plan:
- 31 deterministic regression tests covering all lifecycle transitions,
  stale attempt/rejection detection, duplicate gate, serialization,
  metrics, and auditing logs.
- All 62 existing tests pass with no regressions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ Bounty $7k ] [ Orchestrator ] Avoid duplicate compensation scheduling — failure fan-out

1 participant