AI API Usage Attribution Schema for SaaS Billing
Most SaaS teams start AI billing with a provider invoice and a rough token total. That works during an experiment, but it breaks as soon as multiple customers, API keys, features, models, or providers share the same bill.
A production AI API gateway should record who caused the request, which policy allowed it, which model served it, and how the cost should be charged back. This article outlines a practical attribution schema for OpenAI-compatible gateways, customer API keys, quotas, prepaid balances, and usage-based SaaS billing.
The attribution problem
Provider usage exports usually answer provider-facing questions: model, token count, timestamp, and account. SaaS billing needs product-facing answers:
- Which customer, workspace, project, or tenant generated the request?
- Which end-user, automation, or API key initiated it?
- Which feature or route caused the spend?
- Was the request billable, internal, trial, test, or promotional?
- Did fallback, retry, or quota policy change the final model and cost?
- Should the event reduce prepaid balance, count toward a monthly quota, or only appear in analytics?
If these fields are not captured at request time, the billing team has to reconstruct intent from logs after the fact. That is fragile and difficult to audit.
Minimum useful schema
The exact database shape can vary, but the usage event should preserve enough context to explain cost and customer experience later.
| Field group | Example fields | Purpose |
|---|---|---|
| Request identity | request_id, trace_id, timestamp | Deduplicate events and connect provider calls to application logs. |
| Commercial owner | customer_id, workspace_id, project_id | Attribute usage to the entity that pays or owns the budget. |
| Caller | api_key_id, end_user_id, agent_id | Support per-key quotas, abuse analysis, and customer support questions. |
| Product context | feature, route, environment | Separate customer support drafts, document extraction, coding agents, and batch jobs. |
| Model decision | requested_model, primary_model, served_model, provider | Explain routing, fallback, model substitution, and provider invoices. |
| Usage totals | input_tokens, output_tokens, cached_tokens, request_count | Compute cost and show transparent customer usage. |
| Billing policy | billing_mode, unit_price, cost_usd, charge_usd | Keep provider cost separate from customer charge and margin. |
| Control outcomes | quota_bucket, balance_before, balance_after, fallback_reason | Prove why a request was allowed, downgraded, retried, or rejected. |
Example usage event
{
"request_id": "req_01jz_usage_7kc",
"timestamp": "2026-06-04T12:10:00Z",
"customer_id": "cus_acme",
"workspace_id": "ws_support",
"api_key_id": "key_live_42",
"feature": "support_reply_draft",
"route": "support_low_latency",
"environment": "production",
"requested_model": "gpt-4o-mini",
"primary_model": "provider_a/gpt-4o-mini",
"served_model": "provider_b/compatible-fast-chat",
"fallback_reason": "provider_a_rate_limit",
"input_tokens": 820,
"output_tokens": 210,
"cost_usd": 0.00062,
"charge_usd": 0.00120,
"billing_mode": "prepaid_balance",
"quota_bucket": "monthly_support_ai",
"balance_before": 18.40,
"balance_after": 18.3988
}
Separate cost, charge, and allowance
A common mistake is treating provider cost as the same thing as customer charge. They are related, but they answer different questions.
- Cost is what the model provider or account pool charges the business.
- Charge is what the SaaS product bills or deducts from the customer.
- Allowance is what a plan, quota, coupon, trial, or prepaid balance permits.
Keeping these concepts separate makes it easier to support free tiers, enterprise discounts, internal testing, promotional credits, and margin reviews without rewriting usage history.
Where to attach attribution metadata
The safest pattern is to attach metadata before the OpenAI-compatible request leaves your product boundary. In practice, that means the gateway should receive structured context from the application or derive it from a customer API key.
POST /v1/chat/completions
Authorization: Bearer sk_customer_or_workspace_key
X-Customer-Id: cus_acme
X-Workspace-Id: ws_support
X-Feature: support_reply_draft
X-Billing-Mode: prepaid_balance
When a customer API key already maps to customer, workspace, plan, and quota settings, the application can send less metadata. The gateway can still stamp each usage event with the resolved commercial owner.
Quota and prepaid balance fields
Quota enforcement should produce records even when no provider call is made. A rejected request is important because it explains customer experience and prevents support teams from confusing budget enforcement with provider failure.
| Outcome | Recommended fields | Billing behavior |
|---|---|---|
| Allowed | quota_decision=allow, balance_after | Deduct or count usage normally. |
| Rejected by quota | quota_decision=reject, rejection_reason=monthly_limit | No provider cost; show customer-facing limit reason. |
| Downgraded by budget | quota_decision=downgrade, served_model | Charge according to policy; record quality-impacting change. |
| Promotional credit | credit_source=promo, charge_usd=0 | Track cost internally while showing free customer usage. |
Operational checks
- Can support answer “why was this customer charged?” from one usage event?
- Can finance separate provider cost from customer charge and plan allowance?
- Can product compare AI spend by feature, route, and workspace?
- Can engineering identify retry loops and fallback-driven cost spikes?
- Can the customer dashboard show usage without exposing provider account details?
- Can finance close the month by matching gateway usage to multi-provider AI invoices?
When the answer is yes, AI API usage becomes a manageable product metric instead of a surprise line item.
Explore FerryAPI
Related: AI API usage ledger design explains how to make gateway usage events durable enough for billing, refunds, and provider invoice reconciliation.