-
Notifications
You must be signed in to change notification settings - Fork 6
Circuit Breakers
adham90 edited this page Feb 14, 2026
·
3 revisions
Prevent cascading failures by temporarily blocking requests to failing models.
Without circuit breakers, a failing service causes:
Request 1 → Wait 30s → Timeout
Request 2 → Wait 30s → Timeout
Request 3 → Wait 30s → Timeout
...
Users wait forever, resources are wasted, and the problem compounds.
Circuit breakers detect failure patterns and "trip" to fast-fail:
Requests 1-10 → Failures → Circuit OPENS
Request 11 → Immediate failure (no wait)
Request 12 → Immediate failure (no wait)
...
[After cooldown]
Request N → Try again → Success → Circuit CLOSES
class MyAgent < ApplicationAgent
model "gpt-4o"
circuit_breaker errors: 10, within: 60, cooldown: 300
end| Parameter | Meaning |
|---|---|
errors |
Number of errors to trip the breaker |
within |
Time window in seconds |
cooldown |
Seconds before retrying |
- Requests pass through normally
- Failures are counted
- If errors exceed threshold → Opens
- All requests fail immediately
- No API calls made
- Saves time and resources
- After cooldown → Half-Open
- One request allowed through
- If successful → Closes
- If fails → Opens again
CLOSED ──(too many errors)──► OPEN
▲ │
│ (cooldown)
│ │
└───(success)─── HALF-OPEN ◄┘
│
(failure)
│
└──► OPEN
Each model has its own circuit breaker:
class MyAgent < ApplicationAgent
model "gpt-4o"
fallback_models "claude-3-5-sonnet"
circuit_breaker errors: 5, within: 60, cooldown: 120
end
# gpt-4o breaker opens → try claude-3-5-sonnet
# claude-3-5-sonnet breaker opens → all fail fast# Check if breaker is open for a model
status = RubyLLM::Agents::CircuitBreaker.status("gpt-4o")
# => { state: :open, errors: 10, opens_at: Time, closes_at: Time }
status[:state] # => :closed, :open, or :half_open
status[:errors] # => current error count
status[:opens_at] # => when breaker opened (if open)
status[:closes_at] # => when cooldown ends (if open)execution = RubyLLM::Agents::Execution.last
# Check if attempt was short-circuited
execution.attempts.each do |attempt|
if attempt['short_circuited']
puts "#{attempt['model_id']} was short-circuited"
end
endGet notified when breakers open:
# config/initializers/ruby_llm_agents.rb
RubyLLM::Agents.configure do |config|
config.on_alert = ->(event, payload) {
if event == :breaker_open
Slack::Notifier.new(ENV['SLACK_WEBHOOK']).ping(
"Circuit breaker opened for #{payload[:agent_type]} (#{payload[:model_id]})"
)
end
}
endAlert payload:
{
event: :breaker_open,
agent_type: "MyAgent",
model_id: "gpt-4o",
errors: 10,
within: 60,
cooldown: 300
}circuit_breaker errors: 10, within: 60, cooldown: 300
# 10 errors in 1 minute → block for 5 minutescircuit_breaker errors: 3, within: 30, cooldown: 60
# 3 errors in 30s → block for 1 minutecircuit_breaker errors: 50, within: 300, cooldown: 600
# 50 errors in 5 minutes → block for 10 minutes# Critical agent: Lower threshold
class CriticalAgent < ApplicationAgent
circuit_breaker errors: 5, within: 60, cooldown: 120
end
# Background agent: Higher tolerance
class BackgroundAgent < ApplicationAgent
circuit_breaker errors: 20, within: 300, cooldown: 600
endCircuit breakers work well with fallbacks:
class ResilientAgent < ApplicationAgent
model "gpt-4o"
fallback_models "gpt-4o-mini", "claude-3-5-sonnet"
circuit_breaker errors: 5, within: 60, cooldown: 120
endWhen gpt-4o breaker opens:
- Skip
gpt-4oentirely (no waiting) - Try
gpt-4o-miniimmediately - If that fails, try
claude-3-5-sonnet
In emergencies:
# Force open a breaker
RubyLLM::Agents::CircuitBreaker.open!("gpt-4o")
# Force close a breaker
RubyLLM::Agents::CircuitBreaker.close!("gpt-4o")
# Reset all breakers
RubyLLM::Agents::CircuitBreaker.reset_all!The dashboard shows:
- Current breaker states
- Error counts per model
- Time until cooldown ends
- Historical breaker events
# Low traffic: Lower threshold
# 100 requests/day = 5 errors is significant
circuit_breaker errors: 5, within: 60, cooldown: 300
# High traffic: Higher threshold
# 10,000 requests/day = 50 errors might be noise
circuit_breaker errors: 50, within: 60, cooldown: 300Not all errors should trip breakers:
# Only rate limits and server errors should trip
# Authentication errors won't be helped by waiting# Too short: Breaker oscillates
circuit_breaker cooldown: 10 # Bad: 10 seconds
# Too long: Service recovers but blocked
circuit_breaker cooldown: 3600 # Bad: 1 hour
# Just right: Time for service to recover
circuit_breaker cooldown: 300 # Good: 5 minutes# Always have a fallback when using circuit breakers
model "gpt-4o"
fallback_models "gpt-4o-mini"
circuit_breaker errors: 5, within: 60, cooldown: 120# Track breaker open events
RubyLLM::Agents::Execution
.where("attempts @> ?", '[{"short_circuited": true}]')
.count
# Alert on frequent breaker openings
# Indicates persistent service issues- Increase
errorsthreshold - Increase
withinwindow - Check for actual service issues
- Decrease
errorsthreshold - Ensure errors are being counted correctly
- Check error types are retryable
- Wait for cooldown
- Manually close breaker if urgent:
RubyLLM::Agents::CircuitBreaker.close!("gpt-4o")
- Reliability - Overview of reliability features
- Automatic Retries - Retry configuration
- Model Fallbacks - Fallback model chains
- Alerts - Notification setup