Circuit Breaker
CLOSED/HALF_OPEN/OPEN state machine wrapping every external adapter an agent touches. Fails fast when upstream degrades, stops cascading failures, and costs you nothing in the happy path.
What it does
Every external call — an LLM API, a database, a webhook, a vector store — is wrapped in a breaker. The breaker counts failures over a sliding window. If the failure rate exceeds threshold, it opens: further calls fail instantly without hitting the upstream service.
After a cooldown, it transitions to HALF_OPEN: it lets one call through. If that succeeds, it closes. If it fails, it opens again with a longer cooldown (exponential backoff).
This prevents the three things that kill agent systems: retry storms that DDoS your own upstream, runaway cost burns from retrying failing paid APIs, and cascading timeouts where one slow dependency drags everything down.
Use it
from hummbl_governance import CircuitBreaker, CircuitBreakerState
cb = CircuitBreaker(
failure_threshold=5, # open after 5 failures
recovery_timeout=30.0, # seconds before HALF_OPEN probe
)
try:
result = cb.call(anthropic_client.messages.create, **params)
except Exception:
# Breaker records the failure; falls back to your handler
result = fallback_handler()
if cb.state == CircuitBreakerState.OPEN:
logger.warning("anthropic-api breaker is open")
When to reach for it
- You call any external API from an agent — LLM providers, vector stores, webhooks
- Cost-burn scenarios are a concern (retrying failing calls to paid APIs)
- You have cascading dependency risk (one slow service degrades everything)
- You need adapter-level observability for governance audits
The contract
class CircuitBreakerState(Enum):
CLOSED # normal operation
HALF_OPEN # testing recovery
OPEN # failing fast
CircuitBreaker(
failure_threshold: int = 5,
recovery_timeout: float = 30.0,
on_state_change: Callable | None = None,
)
CircuitBreaker.call(fn, *args, **kwargs) -> Any
CircuitBreaker.state -> CircuitBreakerState # property
CircuitBreaker.failure_count -> int
CircuitBreaker.success_count -> int
CircuitBreaker.reset() -> None
All state transitions write to the Governance Bus for post-incident analysis.
See also
- Kill Switch — fleet-level halt, complementary to per-adapter breaker
- Governance Bus — logs trips, transitions, recoveries