Reliability and billing
AI API Idempotency Keys for OpenAI-Compatible Gateways
A practical idempotency key design for SaaS teams running OpenAI-compatible AI gateways: prevent duplicate retries, double billing, tool replays, and ledger mismatches.
Why idempotency matters for AI APIs
AI API gateways sit between application code, customer API keys, provider routes, retries, fallback, streaming, and billing ledgers. Without a shared idempotency key, a network timeout can turn into two provider calls, two tool executions, two ledger rows, and a support ticket about duplicate charges.
For SaaS teams, idempotency is not just a backend nicety. It is the contract that keeps customer-visible behavior, quota policy, prepaid balance reservation, and provider invoice reconciliation aligned.
What the idempotency key should cover
| Scope | Recommended behavior | What to store |
|---|---|---|
| Same customer request | Return the original result or in-progress status instead of starting a second provider call. | Tenant ID, customer API key ID, request hash, canonical request ID, and current state. |
| Retry after timeout | Reuse the prior reservation and retry budget; never create an unrelated usage row. | Attempt count, upstream request IDs, provider route, timeout reason, and retry decision. |
| Fallback route | Allow fallback only when the request is safe and the original attempt is classified. | Primary provider status, fallback provider, model mapping, and customer-visible response status. |
| Tool or agent action | Prevent replaying side-effecting tool calls unless the application explicitly approves continuation. | Tool call IDs, action state, partial output marker, and replay policy. |
| Billing settlement | Settle all attempts under one canonical request rather than charging per transport retry. | Reserved amount, final billable tokens, refunds, ledger references, and reconciliation status. |
A practical key format
A good idempotency key is generated by the calling application, passed through the OpenAI-compatible gateway, and bound to the tenant and customer API key. Do not trust a global key by itself; two tenants can accidentally generate the same string.
Idempotency-Key: tenant_123:chat:2026-06-04:req_8f3c...
X-FerryAPI-Customer-Key: cak_...
X-FerryAPI-Request-Purpose: support_ticket_summary
The gateway should also compute a request fingerprint from normalized fields such as model, messages, tools, temperature, route policy, and customer key. If the same idempotency key arrives with a materially different request body, return a conflict instead of reusing the old result.
State machine for duplicate requests
- New: reserve balance, create the canonical request ID, and start the first provider attempt.
- In progress: return a resumable or polling response; do not start another provider call.
- Completed: return the cached final response metadata and ledger reference.
- Partial: expose whether continuation is safe; do not silently replay side effects.
- Failed before provider acceptance: allow a bounded retry under the same key.
- Failed after provider acceptance: keep settlement pending until provider usage is reconciled.
How idempotency connects to retries, streams, and ledgers
Idempotency keys should be enforced before the gateway spends retry budget. That makes them a natural companion to a gateway retry budget policy and a streaming timeout policy. If the customer reconnects after an interrupted stream, the gateway can return the same canonical request ID and billing status instead of starting a second generation.
On the finance side, each idempotency key should point to one canonical row in the AI API usage ledger. Provider attempts, fallback attempts, refunds, and settlement events can be child records, but customer billing should be explainable from the canonical request.
Operational safeguards
- Expire idempotency records only after the longest provider reconciliation window has passed.
- Apply tenant-level rate limits before expensive provider calls but after recognizing safe duplicate reads.
- Log conflicts separately; repeated conflicts may indicate SDK bugs or abuse.
- Do not cache sensitive prompt bodies unnecessarily; store hashes and redacted metadata where possible.
- Expose canonical request IDs in support exports so finance, engineering, and customers can discuss the same event.
How FerryAPI helps
FerryAPI is an OpenAI-compatible AI API gateway for teams that need model routing, customer API keys, quotas, prepaid balances, and billing-ready usage records. A centralized gateway is the right place to make idempotency, retries, streaming timeouts, and ledger settlement consistent across every application surface.
Use FerryAPI to centralize OpenAI-compatible request identity, customer quotas, retry policy, prepaid reservations, and usage billing.
When duplicate requests are throttled rather than replayed, expose the decision through AI API rate limit headers so clients know whether to wait, resume, or ask for more quota.