LLM Integration

This is the default adapter guide for AI products.

Use it when the billable work happens at an LLM request boundary rather than at a generic HTTP handler or worker entrypoint.

Why this is different

LLM integration is mostly about choosing the right boundary:

authorize must happen before expensive model work starts
commit must use actual post-call usage
failed or interrupted work must trigger best-effort cancel
retries must not create duplicate accounting

This is why the best integration point is usually the model boundary itself, not the outer request handler.

Anchor order

Prefer these anchors in order:

direct model SDK request boundary
framework callback or span hook around model execution
runner or plugin boundary that owns model and tool lifecycle
API handler, worker, or queue consumer only as fallback

Examples:

direct SDK wrapper: wrap the provider SDK methods directly
framework injection: inject a gated client or model provider into the framework
callback or plugin: use before_model / after_model style hooks
shared helper: route all model calls through one helper if the framework offers no central callback

Core lifecycle

For one LLM operation:

build request context
derive feature_code and optional feature_family_code
authorize
execute the model call
extract actual usage
commit
if execution fails before commit, best-effort cancel

Request context

Each LLM call should carry:

request_id
principal_id or billing_account_id
enough metadata to derive feature_code
optional feature_family_code
optional budget_id

Do not rely on broad globals if the framework already gives you per-turn or per-run state.

`quantity_minor` and `meters[]`

For LLM workloads:

top-level quantity_minor is the feature-level usage quantity and should still be sent even when you also send meters[]
if meters[] is omitted, top-level quantity_minor also applies to the same-named primary meter
if meters[] is present, meter-level pricing and settlement follow meters[]

Practical default:

use total tokens as top-level quantity_minor
send meters[] for input, output, and cached tokens when the model pricing differentiates them

In modern LLM pricing, separate meters[] is usually the right choice.

Streaming

Streaming changes the commit point:

authorize before opening the stream
do not commit on partial chunks
commit only once a final usage-bearing response or terminal event is available
if the stream ends before final usage is available, cancel best-effort instead of guessing

Tool calls

Model calls and tool calls are often different billable resources.

When tools are billable:

gate tool execution separately from model execution
use separate feature_code values
use separate feature_family_code values only if that distinction matters for entitlement, packaging, reporting, or limits

Multi-turn sessions

Sessions are not the billing unit. Individual model and tool operations are.

Good default:

one authorize -> commit/cancel cycle per model call
one authorize -> commit/cancel cycle per billable tool call
fresh gate context per turn or per operation

Retries

LLM frameworks often retry under the hood.

Rules:

each logical operation needs one stable idempotency base
derive authorize, commit, and cancel keys from that base
reuse the same keys on retry of the same logical operation

Choose the framework page that matches your runtime:

Why this is different​

Anchor order​

Core lifecycle​

Request context​

quantity_minor and meters[]​

Streaming​

Tool calls​

Multi-turn sessions​

Retries​

Next​