Skip to content

No chaos/failure testing for circuit breaker and resilience features #56

@yairfalse

Description

@yairfalse

Problem

The circuit breaker, rate limiter, and retry logic are unit tested, but never validated under realistic failure scenarios:

  • What happens when a backend dies mid-request?
  • Does the circuit breaker actually trip under sustained failures?
  • Do retries work correctly with real network errors?

Current State

  • ✅ Unit tests for circuit breaker state machine
  • ✅ Unit tests for rate limiter token bucket
  • ❌ No integration tests with actual failing backends
  • ❌ No chaos engineering (pod kills, network partitions)

Suggested Fix

Integration Tests

  1. Deploy backend that returns 500s on demand
  2. Verify circuit breaker opens after threshold
  3. Verify traffic fails fast when circuit is open
  4. Verify half-open state allows probe requests

Chaos Testing (optional)

  1. Use Chaos Mesh or LitmusChaos in CI
  2. Kill backend pods during load test
  3. Inject network latency
  4. Verify graceful degradation

Files

  • control/src/proxy/circuit_breaker.rs
  • control/src/proxy/rate_limiter.rs
  • tests/integration/scenarios/ (add new scenarios)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions