FerryAPI

Reliability and billing

AI API Idempotency Keys for OpenAI-Compatible Gateways

A practical idempotency key design for SaaS teams running OpenAI-compatible AI gateways: prevent duplicate retries, double billing, tool replays, and ledger mismatches.

Why idempotency matters for AI APIs

AI API gateways sit between application code, customer API keys, provider routes, retries, fallback, streaming, and billing ledgers. Without a shared idempotency key, a network timeout can turn into two provider calls, two tool executions, two ledger rows, and a support ticket about duplicate charges.

For SaaS teams, idempotency is not just a backend nicety. It is the contract that keeps customer-visible behavior, quota policy, prepaid balance reservation, and provider invoice reconciliation aligned.

What the idempotency key should cover

ScopeRecommended behaviorWhat to store
Same customer requestReturn the original result or in-progress status instead of starting a second provider call.Tenant ID, customer API key ID, request hash, canonical request ID, and current state.
Retry after timeoutReuse the prior reservation and retry budget; never create an unrelated usage row.Attempt count, upstream request IDs, provider route, timeout reason, and retry decision.
Fallback routeAllow fallback only when the request is safe and the original attempt is classified.Primary provider status, fallback provider, model mapping, and customer-visible response status.
Tool or agent actionPrevent replaying side-effecting tool calls unless the application explicitly approves continuation.Tool call IDs, action state, partial output marker, and replay policy.
Billing settlementSettle all attempts under one canonical request rather than charging per transport retry.Reserved amount, final billable tokens, refunds, ledger references, and reconciliation status.

A practical key format

A good idempotency key is generated by the calling application, passed through the OpenAI-compatible gateway, and bound to the tenant and customer API key. Do not trust a global key by itself; two tenants can accidentally generate the same string.

Idempotency-Key: tenant_123:chat:2026-06-04:req_8f3c...
X-FerryAPI-Customer-Key: cak_...
X-FerryAPI-Request-Purpose: support_ticket_summary

The gateway should also compute a request fingerprint from normalized fields such as model, messages, tools, temperature, route policy, and customer key. If the same idempotency key arrives with a materially different request body, return a conflict instead of reusing the old result.

State machine for duplicate requests

How idempotency connects to retries, streams, and ledgers

Idempotency keys should be enforced before the gateway spends retry budget. That makes them a natural companion to a gateway retry budget policy and a streaming timeout policy. If the customer reconnects after an interrupted stream, the gateway can return the same canonical request ID and billing status instead of starting a second generation.

On the finance side, each idempotency key should point to one canonical row in the AI API usage ledger. Provider attempts, fallback attempts, refunds, and settlement events can be child records, but customer billing should be explainable from the canonical request.

Operational safeguards

How FerryAPI helps

FerryAPI is an OpenAI-compatible AI API gateway for teams that need model routing, customer API keys, quotas, prepaid balances, and billing-ready usage records. A centralized gateway is the right place to make idempotency, retries, streaming timeouts, and ledger settlement consistent across every application surface.

Need fewer duplicate AI charges?
Use FerryAPI to centralize OpenAI-compatible request identity, customer quotas, retry policy, prepaid reservations, and usage billing.

When duplicate requests are throttled rather than replayed, expose the decision through AI API rate limit headers so clients know whether to wait, resume, or ask for more quota.