Usage billing architecture

AI API Usage Ledger Design for SaaS Billing Teams

A practical design guide for building an immutable AI API usage ledger across customer API keys, model routing, retries, refunds, prepaid balances, invoice reconciliation, and customer usage exports.

Why an AI usage ledger is different from request logs

Request logs are helpful for debugging, but they are not enough for billing. SaaS teams need an append-only usage ledger that can explain how provider calls became customer charges, prepaid balance movements, quota decisions, refunds, and invoices. The ledger should survive retries, fallback routes, streaming responses, partial failures, and provider invoice corrections.

The practical rule: logs can be noisy and temporary; the usage ledger should be durable, idempotent, auditable, and tied to the customer contract.

Core ledger events

Event	When it is written	Fields that matter
`request_authorized`	After customer API key and quota checks pass	tenant_id, key_id, feature, model_alias, policy_version, request_id
`cost_reserved`	Before the provider call when prepaid balance is used	reservation_id, estimated_cost, currency, balance_before, expires_at
`provider_call_started`	When the gateway selects a provider route	provider, provider_account_pool, model_id, route_id, fallback_group
`usage_measured`	After the provider response or stream closes	input_tokens, output_tokens, cached_tokens, tool_calls, latency_ms, status
`cost_settled`	After actual usage is priced	provider_cost, customer_charge, margin, price_version, reservation_delta
`refund_or_adjustment`	For failed calls, credits, invoice corrections, or manual support actions	original_event_id, adjustment_reason, approved_by, amount

Idempotency is the safety rail

Every billable request needs a stable idempotency key. Without one, network retries can double-charge customers or double-count provider cost. A good key combines the customer request id, gateway request id, attempt number, route id, and event type. Retried provider calls should create distinct attempt records but settle into one customer-visible charge unless your product explicitly bills each attempt.

If the provider returns usage after a timeout, do not guess. Store the uncertain state, reconcile it later, and keep the customer-facing balance conservative until the actual cost is known.

Recommended schema shape

Ledger event id: globally unique and immutable.
Correlation ids: tenant, workspace, customer API key, feature, request, provider attempt, invoice period.
Policy snapshot: quota policy, pricing version, model routing rule, fallback rule, and prepaid balance rule used at decision time.
Usage facts: token counts, request count, tool calls, cache hit status, streaming completion state, and provider status.
Money fields: provider cost, customer charge, currency, exchange rate, reservation amount, refund amount, and margin.
Audit fields: source system, actor, created_at, supersedes_event_id, and adjustment reason.

How to handle fallback and retries

Fallback can make billing confusing because one user request may touch multiple providers. The ledger should separate provider-attempt cost from customer-facing charge. For example, if Provider A times out after partial work and Provider B succeeds, the ledger can record both provider attempts while charging the customer according to a single product policy. This is also where you decide whether failed attempts are absorbed as infrastructure cost or exposed as paid usage.

Weekly reconciliation checks

Compare provider invoice totals against settled provider_cost by provider, model, account pool, and day.
Find unsettled reservations older than the normal streaming or timeout window.
List customer charges that have no matching provider attempt, excluding cached or internal-test traffic.
List provider attempts that have no customer-visible policy decision.
Review adjustment events by reason and approver to catch product or support process gaps.

Streaming interruptions need ledger-level state too; a streaming timeout policy helps separate observed tokens, provider-final tokens, reservations, settlement, and partial refunds.

How FerryAPI fits

FerryAPI is built for the controls around this ledger: customer API keys, model routing, quota enforcement, prepaid balances, and usage records that SaaS teams can reconcile. The goal is not just to call cheaper models; it is to make every AI API request explainable from gateway decision to customer invoice.

Related: AI API cost anomaly detection runbook turns ledger signals into alerts, containment steps, and reconciliation follow-up.

Key lifecycle events should also be captured by an OpenAI-compatible API key rotation policy, so rotated or revoked credentials do not break tenant-level ledger continuity.

For teams designing observability alongside billing, pair the ledger with an AI API request logging redaction checklist so debugging data does not expose customer prompts or credentials.

A reliable ledger gives AI API spend alerts the same numbers that billing and support will use later. It also gives teams the evidence needed for a consistent AI API refund policy for failed requests.

Need a cleaner AI usage ledger?
FerryAPI helps SaaS teams connect customer API keys, quota policies, model routes, prepaid balances, and invoice-ready usage records through an OpenAI-compatible gateway. Explore FerryAPI.

Use idempotency keys for OpenAI-compatible gateways to bind retries, fallback attempts, reservations, and settlement rows to one customer-visible request.