FerryAPI

Usage billing architecture

AI API Usage Ledger Design for SaaS Billing Teams

A practical design guide for building an immutable AI API usage ledger across customer API keys, model routing, retries, refunds, prepaid balances, invoice reconciliation, and customer usage exports.

Why an AI usage ledger is different from request logs

Request logs are helpful for debugging, but they are not enough for billing. SaaS teams need an append-only usage ledger that can explain how provider calls became customer charges, prepaid balance movements, quota decisions, refunds, and invoices. The ledger should survive retries, fallback routes, streaming responses, partial failures, and provider invoice corrections.

The practical rule: logs can be noisy and temporary; the usage ledger should be durable, idempotent, auditable, and tied to the customer contract.

Core ledger events

EventWhen it is writtenFields that matter
request_authorizedAfter customer API key and quota checks passtenant_id, key_id, feature, model_alias, policy_version, request_id
cost_reservedBefore the provider call when prepaid balance is usedreservation_id, estimated_cost, currency, balance_before, expires_at
provider_call_startedWhen the gateway selects a provider routeprovider, provider_account_pool, model_id, route_id, fallback_group
usage_measuredAfter the provider response or stream closesinput_tokens, output_tokens, cached_tokens, tool_calls, latency_ms, status
cost_settledAfter actual usage is pricedprovider_cost, customer_charge, margin, price_version, reservation_delta
refund_or_adjustmentFor failed calls, credits, invoice corrections, or manual support actionsoriginal_event_id, adjustment_reason, approved_by, amount

Idempotency is the safety rail

Every billable request needs a stable idempotency key. Without one, network retries can double-charge customers or double-count provider cost. A good key combines the customer request id, gateway request id, attempt number, route id, and event type. Retried provider calls should create distinct attempt records but settle into one customer-visible charge unless your product explicitly bills each attempt.

If the provider returns usage after a timeout, do not guess. Store the uncertain state, reconcile it later, and keep the customer-facing balance conservative until the actual cost is known.

Recommended schema shape

How to handle fallback and retries

Fallback can make billing confusing because one user request may touch multiple providers. The ledger should separate provider-attempt cost from customer-facing charge. For example, if Provider A times out after partial work and Provider B succeeds, the ledger can record both provider attempts while charging the customer according to a single product policy. This is also where you decide whether failed attempts are absorbed as infrastructure cost or exposed as paid usage.

Weekly reconciliation checks

  1. Compare provider invoice totals against settled provider_cost by provider, model, account pool, and day.
  2. Find unsettled reservations older than the normal streaming or timeout window.
  3. List customer charges that have no matching provider attempt, excluding cached or internal-test traffic.
  4. List provider attempts that have no customer-visible policy decision.
  5. Review adjustment events by reason and approver to catch product or support process gaps.

Streaming interruptions need ledger-level state too; a streaming timeout policy helps separate observed tokens, provider-final tokens, reservations, settlement, and partial refunds.

How FerryAPI fits

FerryAPI is built for the controls around this ledger: customer API keys, model routing, quota enforcement, prepaid balances, and usage records that SaaS teams can reconcile. The goal is not just to call cheaper models; it is to make every AI API request explainable from gateway decision to customer invoice.

Related FerryAPI guides: AI API usage attribution schema, multi-provider invoice reconciliation, LLM prepaid balance implementation, and AI API quota policy examples.

Related: AI API cost anomaly detection runbook turns ledger signals into alerts, containment steps, and reconciliation follow-up.

Key lifecycle events should also be captured by an OpenAI-compatible API key rotation policy, so rotated or revoked credentials do not break tenant-level ledger continuity.

For teams designing observability alongside billing, pair the ledger with an AI API request logging redaction checklist so debugging data does not expose customer prompts or credentials.

A reliable ledger gives AI API spend alerts the same numbers that billing and support will use later. It also gives teams the evidence needed for a consistent AI API refund policy for failed requests.

Need a cleaner AI usage ledger?
FerryAPI helps SaaS teams connect customer API keys, quota policies, model routes, prepaid balances, and invoice-ready usage records through an OpenAI-compatible gateway. Explore FerryAPI.

Use idempotency keys for OpenAI-compatible gateways to bind retries, fallback attempts, reservations, and settlement rows to one customer-visible request.