FerryAPI

Billing architecture

LLM Prepaid Balance Implementation for SaaS AI Products

A practical implementation guide for SaaS teams adding LLM prepaid balances, reservations, quota checks, settlement, refunds, and invoice-ready usage records.

Why prepaid balance logic belongs near the gateway

AI SaaS products often start with one provider account and one monthly bill. That works until customers need usage-based pricing, agents can spend money in the background, and finance needs to know whether a request should be accepted before it reaches a model provider. A prepaid balance system should sit close to the OpenAI-compatible gateway so every request can be checked, reserved, routed, and settled with the same policy.

The implementation goal is simple: never let an unowned LLM request create surprise provider spend, and never make the billing ledger depend on provider invoices alone.

Core data model

ObjectRequired fieldsImplementation note
Tenant balancetenant_id, currency, available_balance, reserved_balance, credit_limit, statusKeep available and reserved amounts separate so long-running requests do not double spend.
Price tableprovider, model, input_token_price, output_token_price, cached_token_price, effective_atVersion prices so old usage can be reconciled even after provider pricing changes.
Reservationreservation_id, request_id, tenant_id, estimated_cost, expires_at, statusExpire abandoned reservations and release balance automatically.
Usage eventrequest_id, api_key_id, feature, model, tokens, provider_cost, customer_cost, policy_versionMake the usage event invoice-ready, not just observability metadata.
Ledger entryentry_id, tenant_id, amount, type, source_id, created_atUse append-only entries for top-ups, reservations, settlements, refunds, adjustments, and credits.

Request lifecycle

  1. Identify the tenant: resolve the customer API key before provider routing. Reject unowned or suspended keys early.
  2. Estimate spend: use model, prompt tokens, max output tokens, cached-token policy, and route-specific markup to estimate a worst-case cost.
  3. Reserve balance: atomically move the estimate from available balance to reserved balance, or apply the tenant over-limit policy.
  4. Route the model call: send the request to the approved provider/model only after balance and quota checks pass.
  5. Settle actual usage: calculate real input/output/cached token cost, release unused reservation, and append a usage ledger entry.
  6. Export billing data: group usage by tenant, feature, model, and invoice period for dashboards and finance reconciliation.

Over-limit policies to define up front

Failure modes that cause billing drift

Failure modePrevention
Provider timeout after tokens were generatedRecord provider request ids and reconcile against provider usage exports.
Retry counted as two customer requestsAttach idempotency keys and settlement state to the gateway request id.
Price table changes mid-periodStore price_version on every usage event and ledger entry.
Fallback uses a more expensive modelRequire policy approval before fallback can exceed the reserved cost envelope.
Streaming response disconnects earlySettle from provider-reported actual usage, not client-visible completion length alone.

Operational metrics

Where FerryAPI fits

FerryAPI is an OpenAI-compatible API gateway for teams that need model routing, customer API keys, quota controls, and usage billing across providers. Related implementation guides: AI API usage attribution schema, tenant-level budget guardrails, AI API refund policy for failed requests, and OpenRouter alternative migration plan.

Building prepaid LLM billing?
Use FerryAPI to keep OpenAI-compatible requests tied to customer keys, quota policy, provider routing, and invoice-ready usage records. Explore FerryAPI.

For concrete policy templates, see AI API quota policy examples.