Skip to main content

OpenAI SDK Adapter

Use this adapter when your application calls the OpenAI SDK directly.

This is the best fit when your runtime owns direct calls such as:

  • responses.create
  • streaming responses
  • chat.completions.create

Integration shape

The core pattern is direct SDK wrapping:

  • build one shared service client at process startup
  • build or wrap one shared OpenAI client
  • bind request-scoped gate context around each model call
  • let the wrapper enforce authorize -> execute -> commit/cancel

This is a good fit because usage is returned by the SDK response itself.

What to carry per request

For each call, provide:

  • request_id
  • principal_id or billing_account_id
  • vendor or model metadata
  • optional feature_family_code
  • optional budget_id

Streaming note

For streaming:

  • authorize before opening the stream
  • consume the stream normally
  • commit only when the final usage-bearing response is available

Typical usage model

In most cases:

  • top-level quantity_minor should be total tokens
  • meters[] should separate input, output, and cached tokens when pricing differs across them

Artifact roles

  • vluna_adapter.*
    • This is the file to copy into your codebase if you already use the direct OpenAI SDK pattern.
    • Expect to make small, targeted changes for feature-code mapping, identity wiring, logging, and local conventions.
  • example.*
    • This is only a demo of how the adapter is invoked from app code.
    • Use it to understand startup wiring, request context binding, streaming shape, and tool-call patterns.

Downloadable artifacts

Python:

TypeScript:

When to choose something else