Operations and reliability
AI API Provider Failover Runbook for SaaS Teams
A practical provider failover runbook for SaaS teams running OpenAI-compatible AI gateways: decide when to switch providers, protect budgets, preserve billing, and keep customer-visible behavior predictable.
Why failover needs a runbook
Provider failover is not just a reliability switch. In an AI API gateway, changing providers can change model behavior, token pricing, rate-limit shape, latency, safety behavior, streaming quality, and the invoice a customer receives later. A failover that keeps the request alive but breaks attribution or budget controls still creates operational debt.
SaaS teams should define the failover path before an outage. The gateway is the right place to enforce it because it can see the customer API key, tenant quota, model route, prepaid balance, retry budget, and usage ledger in one decision point.
Failover decision matrix
| Signal | Recommended action | Reason |
|---|---|---|
| Provider 5xx or regional outage | Fail over to the approved equivalent route after one bounded retry. | Protects availability without creating uncontrolled duplicate calls. |
| 429 or quota exhausted | Use a secondary provider only if tenant policy allows burst or premium routing. | Avoids turning customer overuse into hidden platform cost. |
| Auth, billing, or invalid request error | Do not fail over automatically; surface a clear configuration error. | Another provider will not fix a bad key, invalid payload, or exhausted payment source. |
| High latency but no failure | Fail over only for latency-sensitive tiers with a documented timeout threshold. | Prevents routine slow requests from drifting to more expensive providers. |
| Model behavior mismatch | Prefer degraded-mode messaging or user-visible model change over silent substitution. | Maintains trust when answer quality or compliance behavior could change. |
Controls to configure before an incident
- Approved route pairs: map each public model alias to allowed primary and secondary provider routes.
- Tier-specific policy: free, trial, paid, and enterprise tenants can have different failover depth and latency targets.
- Cost ceiling: estimate secondary-route cost before switching, especially when the fallback model is more expensive.
- Prepaid reservation: reserve enough balance before the failover attempt so provider spend and customer balance stay aligned.
- Idempotency rule: require idempotency keys for jobs, agents, and workflows where duplicate output has business impact.
- Audit fields: record
primary_provider,failover_provider,failover_reason,attempt_index,estimated_cost, andfinal_billable_cost.
Incident workflow
- Detect route health. Combine provider error rates, latency, streaming disconnects, and invoice-side anomalies.
- Classify the failure. Separate transient provider problems from customer configuration, quota, auth, and invalid payload errors.
- Check retry budget. Apply the same bounded retry policy described in the OpenAI-compatible gateway retry budget policy.
- Confirm budget and balance. Use tenant quota and prepaid-balance checks before the secondary provider call.
- Route with visible metadata. Preserve customer-facing model alias while storing the resolved provider route for support and billing.
- Reconcile after recovery. Compare gateway ledger records with provider invoices using a multi-provider invoice reconciliation process.
Billing and support notes
The hard part of failover is often not the API call; it is explaining the bill later. Store one canonical request ID across the primary attempt, retry, failover attempt, final response, and any refund. This lets support answer customer questions without scraping provider dashboards.
When failover changes price materially, consider labeling the request as platform_absorbed, customer_billable, or partially_refunded. That makes downstream usage exports and invoices easier to reason about.
Provider failover should also respect a streaming timeout policy, because partial streamed output can make silent retries or model switches unsafe for customer workflows.
How FerryAPI helps
FerryAPI is an OpenAI-compatible AI API gateway for teams that need practical model routing, customer API keys, quota policy, prepaid balance controls, and usage records across provider routes. It helps SaaS teams make failover a transparent operations policy instead of an invisible SDK behavior.
Use FerryAPI to centralize OpenAI-compatible provider routes, customer API keys, quotas, prepaid balances, and billing-ready usage logs.