LLM Integration
This is the default adapter guide for AI products.
Use it when the billable work happens at an LLM request boundary rather than at a generic HTTP handler or worker entrypoint.
Why this is different
LLM integration is mostly about choosing the right boundary:
authorizemust happen before expensive model work startscommitmust use actual post-call usage- failed or interrupted work must trigger best-effort
cancel - retries must not create duplicate accounting
This is why the best integration point is usually the model boundary itself, not the outer request handler.
Anchor order
Prefer these anchors in order:
- direct model SDK request boundary
- framework callback or span hook around model execution
- runner or plugin boundary that owns model and tool lifecycle
- API handler, worker, or queue consumer only as fallback
Examples:
- direct SDK wrapper: wrap the provider SDK methods directly
- framework injection: inject a gated client or model provider into the framework
- callback or plugin:
use
before_model/after_modelstyle hooks - shared helper: route all model calls through one helper if the framework offers no central callback
Core lifecycle
For one LLM operation:
- build request context
- derive
feature_codeand optionalfeature_family_code authorize- execute the model call
- extract actual usage
commit- if execution fails before commit, best-effort
cancel
Request context
Each LLM call should carry:
request_idprincipal_idorbilling_account_id- enough metadata to derive
feature_code - optional
feature_family_code - optional
budget_id
Do not rely on broad globals if the framework already gives you per-turn or per-run state.
quantity_minor and meters[]
For LLM workloads:
- top-level
quantity_minoris the feature-level usage quantity and should still be sent even when you also sendmeters[] - if
meters[]is omitted, top-levelquantity_minoralso applies to the same-named primary meter - if
meters[]is present, meter-level pricing and settlement followmeters[]
Practical default:
- use total tokens as top-level
quantity_minor - send
meters[]for input, output, and cached tokens when the model pricing differentiates them
In modern LLM pricing, separate meters[] is usually the right choice.
Streaming
Streaming changes the commit point:
- authorize before opening the stream
- do not commit on partial chunks
- commit only once a final usage-bearing response or terminal event is available
- if the stream ends before final usage is available, cancel best-effort instead of guessing
Tool calls
Model calls and tool calls are often different billable resources.
When tools are billable:
- gate tool execution separately from model execution
- use separate
feature_codevalues - use separate
feature_family_codevalues only if that distinction matters for entitlement, packaging, reporting, or limits
Multi-turn sessions
Sessions are not the billing unit. Individual model and tool operations are.
Good default:
- one
authorize -> commit/cancelcycle per model call - one
authorize -> commit/cancelcycle per billable tool call - fresh gate context per turn or per operation
Retries
LLM frameworks often retry under the hood.
Rules:
- each logical operation needs one stable idempotency base
- derive
authorize,commit, andcancelkeys from that base - reuse the same keys on retry of the same logical operation
Next
Choose the framework page that matches your runtime: