Operations and reliability

AI API Provider Failover Runbook for SaaS Teams

A practical provider failover runbook for SaaS teams running OpenAI-compatible AI gateways: decide when to switch providers, protect budgets, preserve billing, and keep customer-visible behavior predictable.

Why failover needs a runbook

Provider failover is not just a reliability switch. In an AI API gateway, changing providers can change model behavior, token pricing, rate-limit shape, latency, safety behavior, streaming quality, and the invoice a customer receives later. A failover that keeps the request alive but breaks attribution or budget controls still creates operational debt.

SaaS teams should define the failover path before an outage. The gateway is the right place to enforce it because it can see the customer API key, tenant quota, model route, prepaid balance, retry budget, and usage ledger in one decision point.

Failover decision matrix

Signal	Recommended action	Reason
Provider 5xx or regional outage	Fail over to the approved equivalent route after one bounded retry.	Protects availability without creating uncontrolled duplicate calls.
429 or quota exhausted	Use a secondary provider only if tenant policy allows burst or premium routing.	Avoids turning customer overuse into hidden platform cost.
Auth, billing, or invalid request error	Do not fail over automatically; surface a clear configuration error.	Another provider will not fix a bad key, invalid payload, or exhausted payment source.
High latency but no failure	Fail over only for latency-sensitive tiers with a documented timeout threshold.	Prevents routine slow requests from drifting to more expensive providers.
Model behavior mismatch	Prefer degraded-mode messaging or user-visible model change over silent substitution.	Maintains trust when answer quality or compliance behavior could change.

Controls to configure before an incident

Approved route pairs: map each public model alias to allowed primary and secondary provider routes.
Tier-specific policy: free, trial, paid, and enterprise tenants can have different failover depth and latency targets.
Cost ceiling: estimate secondary-route cost before switching, especially when the fallback model is more expensive.
Prepaid reservation: reserve enough balance before the failover attempt so provider spend and customer balance stay aligned.
Idempotency rule: require idempotency keys for jobs, agents, and workflows where duplicate output has business impact.
Audit fields: record primary_provider, failover_provider, failover_reason, attempt_index, estimated_cost, and final_billable_cost.

Incident workflow

Detect route health. Combine provider error rates, latency, streaming disconnects, and invoice-side anomalies.
Classify the failure. Separate transient provider problems from customer configuration, quota, auth, and invalid payload errors.
Check retry budget. Apply the same bounded retry policy described in the OpenAI-compatible gateway retry budget policy.
Confirm budget and balance. Use tenant quota and prepaid-balance checks before the secondary provider call.
Route with visible metadata. Preserve customer-facing model alias while storing the resolved provider route for support and billing.
Reconcile after recovery. Compare gateway ledger records with provider invoices using a multi-provider invoice reconciliation process.

Billing and support notes

The hard part of failover is often not the API call; it is explaining the bill later. Store one canonical request ID across the primary attempt, retry, failover attempt, final response, and any refund. This lets support answer customer questions without scraping provider dashboards.

When failover changes price materially, consider labeling the request as platform_absorbed, customer_billable, or partially_refunded. That makes downstream usage exports and invoices easier to reason about.

Provider failover should also respect a streaming timeout policy, because partial streamed output can make silent retries or model switches unsafe for customer workflows.

How FerryAPI helps

FerryAPI is an OpenAI-compatible AI API gateway for teams that need practical model routing, customer API keys, quota policy, prepaid balance controls, and usage records across provider routes. It helps SaaS teams make failover a transparent operations policy instead of an invisible SDK behavior.

Need safer provider routing?
Use FerryAPI to centralize OpenAI-compatible provider routes, customer API keys, quotas, prepaid balances, and billing-ready usage logs.