Billing architecture

LLM Prepaid Balance Implementation for SaaS AI Products

A practical implementation guide for SaaS teams adding LLM prepaid balances, reservations, quota checks, settlement, refunds, and invoice-ready usage records.

Why prepaid balance logic belongs near the gateway

AI SaaS products often start with one provider account and one monthly bill. That works until customers need usage-based pricing, agents can spend money in the background, and finance needs to know whether a request should be accepted before it reaches a model provider. A prepaid balance system should sit close to the OpenAI-compatible gateway so every request can be checked, reserved, routed, and settled with the same policy.

The implementation goal is simple: never let an unowned LLM request create surprise provider spend, and never make the billing ledger depend on provider invoices alone.

Core data model

Object	Required fields	Implementation note
Tenant balance	tenant_id, currency, available_balance, reserved_balance, credit_limit, status	Keep available and reserved amounts separate so long-running requests do not double spend.
Price table	provider, model, input_token_price, output_token_price, cached_token_price, effective_at	Version prices so old usage can be reconciled even after provider pricing changes.
Reservation	reservation_id, request_id, tenant_id, estimated_cost, expires_at, status	Expire abandoned reservations and release balance automatically.
Usage event	request_id, api_key_id, feature, model, tokens, provider_cost, customer_cost, policy_version	Make the usage event invoice-ready, not just observability metadata.
Ledger entry	entry_id, tenant_id, amount, type, source_id, created_at	Use append-only entries for top-ups, reservations, settlements, refunds, adjustments, and credits.

Request lifecycle

Identify the tenant: resolve the customer API key before provider routing. Reject unowned or suspended keys early.
Estimate spend: use model, prompt tokens, max output tokens, cached-token policy, and route-specific markup to estimate a worst-case cost.
Reserve balance: atomically move the estimate from available balance to reserved balance, or apply the tenant over-limit policy.
Route the model call: send the request to the approved provider/model only after balance and quota checks pass.
Settle actual usage: calculate real input/output/cached token cost, release unused reservation, and append a usage ledger entry.
Export billing data: group usage by tenant, feature, model, and invoice period for dashboards and finance reconciliation.

Over-limit policies to define up front

Hard stop: reject with a clear billing error before provider spend occurs.
Soft grace: allow a small negative balance for trusted paid plans, then notify admins.
Model downgrade: route to a cheaper model only when the feature can tolerate quality differences.
Queue for top-up: hold background jobs until a balance webhook or manual top-up arrives.
Admin override: require an auditable policy version and expiry time, not an informal database edit.

Failure modes that cause billing drift

Failure mode	Prevention
Provider timeout after tokens were generated	Record provider request ids and reconcile against provider usage exports.
Retry counted as two customer requests	Attach idempotency keys and settlement state to the gateway request id.
Price table changes mid-period	Store price_version on every usage event and ledger entry.
Fallback uses a more expensive model	Require policy approval before fallback can exceed the reserved cost envelope.
Streaming response disconnects early	Settle from provider-reported actual usage, not client-visible completion length alone.

Operational metrics

Reservation denial rate by tenant, feature, and model.
Estimated cost vs actual cost variance.
Expired reservation amount and count.
Provider invoice total vs gateway ledger total by billing period.
Top customers approaching balance, quota, or credit-limit thresholds.

Where FerryAPI fits

FerryAPI is an OpenAI-compatible API gateway for teams that need model routing, customer API keys, quota controls, and usage billing across providers. Related implementation guides: AI API usage attribution schema, tenant-level budget guardrails, AI API refund policy for failed requests, and OpenRouter alternative migration plan.

Building prepaid LLM billing?
Use FerryAPI to keep OpenAI-compatible requests tied to customer keys, quota policy, provider routing, and invoice-ready usage records. Explore FerryAPI.

For concrete policy templates, see AI API quota policy examples.