FerryAPI

If AI usage is billed to customers, workspaces, or products, every request needs a clean attribution record before it reaches a model provider.

AI API Usage Attribution Schema for SaaS Billing

Most SaaS teams start AI billing with a provider invoice and a rough token total. That works during an experiment, but it breaks as soon as multiple customers, API keys, features, models, or providers share the same bill.

A production AI API gateway should record who caused the request, which policy allowed it, which model served it, and how the cost should be charged back. This article outlines a practical attribution schema for OpenAI-compatible gateways, customer API keys, quotas, prepaid balances, and usage-based SaaS billing.

The attribution problem

Provider usage exports usually answer provider-facing questions: model, token count, timestamp, and account. SaaS billing needs product-facing answers:

If these fields are not captured at request time, the billing team has to reconstruct intent from logs after the fact. That is fragile and difficult to audit.

Minimum useful schema

The exact database shape can vary, but the usage event should preserve enough context to explain cost and customer experience later.

Field groupExample fieldsPurpose
Request identityrequest_id, trace_id, timestampDeduplicate events and connect provider calls to application logs.
Commercial ownercustomer_id, workspace_id, project_idAttribute usage to the entity that pays or owns the budget.
Callerapi_key_id, end_user_id, agent_idSupport per-key quotas, abuse analysis, and customer support questions.
Product contextfeature, route, environmentSeparate customer support drafts, document extraction, coding agents, and batch jobs.
Model decisionrequested_model, primary_model, served_model, providerExplain routing, fallback, model substitution, and provider invoices.
Usage totalsinput_tokens, output_tokens, cached_tokens, request_countCompute cost and show transparent customer usage.
Billing policybilling_mode, unit_price, cost_usd, charge_usdKeep provider cost separate from customer charge and margin.
Control outcomesquota_bucket, balance_before, balance_after, fallback_reasonProve why a request was allowed, downgraded, retried, or rejected.

Example usage event

{
  "request_id": "req_01jz_usage_7kc",
  "timestamp": "2026-06-04T12:10:00Z",
  "customer_id": "cus_acme",
  "workspace_id": "ws_support",
  "api_key_id": "key_live_42",
  "feature": "support_reply_draft",
  "route": "support_low_latency",
  "environment": "production",
  "requested_model": "gpt-4o-mini",
  "primary_model": "provider_a/gpt-4o-mini",
  "served_model": "provider_b/compatible-fast-chat",
  "fallback_reason": "provider_a_rate_limit",
  "input_tokens": 820,
  "output_tokens": 210,
  "cost_usd": 0.00062,
  "charge_usd": 0.00120,
  "billing_mode": "prepaid_balance",
  "quota_bucket": "monthly_support_ai",
  "balance_before": 18.40,
  "balance_after": 18.3988
}

Separate cost, charge, and allowance

A common mistake is treating provider cost as the same thing as customer charge. They are related, but they answer different questions.

Keeping these concepts separate makes it easier to support free tiers, enterprise discounts, internal testing, promotional credits, and margin reviews without rewriting usage history.

Where to attach attribution metadata

The safest pattern is to attach metadata before the OpenAI-compatible request leaves your product boundary. In practice, that means the gateway should receive structured context from the application or derive it from a customer API key.

POST /v1/chat/completions
Authorization: Bearer sk_customer_or_workspace_key
X-Customer-Id: cus_acme
X-Workspace-Id: ws_support
X-Feature: support_reply_draft
X-Billing-Mode: prepaid_balance

When a customer API key already maps to customer, workspace, plan, and quota settings, the application can send less metadata. The gateway can still stamp each usage event with the resolved commercial owner.

Quota and prepaid balance fields

Quota enforcement should produce records even when no provider call is made. A rejected request is important because it explains customer experience and prevents support teams from confusing budget enforcement with provider failure.

OutcomeRecommended fieldsBilling behavior
Allowedquota_decision=allow, balance_afterDeduct or count usage normally.
Rejected by quotaquota_decision=reject, rejection_reason=monthly_limitNo provider cost; show customer-facing limit reason.
Downgraded by budgetquota_decision=downgrade, served_modelCharge according to policy; record quality-impacting change.
Promotional creditcredit_source=promo, charge_usd=0Track cost internally while showing free customer usage.

Operational checks

When the answer is yes, AI API usage becomes a manageable product metric instead of a surprise line item.

FerryAPI helps teams centralize AI API keys, routing, quotas, and usage records behind an OpenAI-compatible gateway. If your SaaS product needs customer-level attribution and billing controls, start with a gateway layer that records ownership before requests reach model providers.

Explore FerryAPI

Related: AI API usage ledger design explains how to make gateway usage events durable enough for billing, refunds, and provider invoice reconciliation.