All Systems Operational

Last checked: just now · Updated every 30 seconds

Services

API Gateway
Primary routing endpoint
Operational 18ms 99.98%
Model Routing
Intelligent provider selection
Operational 5ms 99.99%
Authentication
API key validation & auth
Operational 12ms 100%
Dashboard
Web UI & analytics
Operational 45ms 99.95%
Webhooks
Event delivery
Operational 99.97%

90-Day Uptime

99.97% average
Operational Degraded Incident

Incident History

Elevated latency on API Gateway
Resolved Mar 22, 2026

A misconfigured load balancer caused P99 latency to spike to ~800ms for approximately 23 minutes. No requests were dropped. Root cause identified and corrected. Post-mortem published internally.

Partial Anthropic model unavailability
Resolved Mar 8, 2026

Anthropic experienced an upstream outage affecting Claude models for ~45 minutes. Our fallback routing automatically redirected affected requests to OpenAI models. Users with strict model pinning experienced errors; automatic retry logic resolved 94% without user intervention.

Authentication service degradation
Resolved Feb 17, 2026

A database connection pool exhaustion caused token validation failures for approximately 8 minutes during a traffic spike. All failed requests returned 401 errors. Connection pool limits have been increased and auto-scaling thresholds adjusted.