infer
infer is a function passed into onCheckpoint on CheckpointContext. It runs an inference request bound to the just-saved checkpoint adapter and returns the raw Response. There is no top-level infer export; the SDK exposes it as a callback argument so that the call is automatically scoped to the right job + checkpoint step.
Signature
Parameters
| Field | Type | Notes |
|---|---|---|
messages | ChatMessage[] | Chat history. Discriminated union over system / user / assistant (with optional tool_calls) / tool (with tool_call_id); matches the OpenAI message shape so a tool-calling history can round-trip. |
temperature | number? | Sampling temperature. Backend default if omitted. |
topP | number? | Nucleus sampling. Backend default if omitted. |
maxTokens | number? | Maximum response tokens. Backend default if omitted. |
stream | boolean? | Default true (SSE). Set false for a single JSON body. |
tools | ToolDefinition[]? | Function declarations the model is allowed to call. When set without an explicit toolChoice, the OpenAI-compatible default "auto" applies; the underlying endpoint must be configured for auto-tool extraction or the request returns 400 tool_calling_not_configured. |
toolChoice | ToolChoice? | "auto" / "none" / "required" / { type: "function", function: { name } }. Only "auto" (and the default when tools is present) needs the auto-extraction parser; the rest go through the guided-decoding path. |
responseFormat | ResponseFormat? | OpenAI’s standard structured-output knob: { type: "text" }, { type: "json_object" }, or { type: "json_schema", json_schema: { name, schema, strict? } }. Prefer this when expressible. |
structuredOutputs | StructuredOutputs? | vLLM extension for constraints responseFormat can’t express. When supplied, exactly one of json / regex / choice / grammar / json_object must be set (vLLM 0.20’s StructuredOutputsParams.__post_init__ rejects 0 or 2+ at parse time, before any merge with responseFormat); the TypeScript type encodes this via ExactlyOne. Combining a structuredOutputs constraint with a responseFormat constraint (json_object / json_schema) is also rejected: vLLM ends up with two constraints in the merged sampling params. json_object accepts only true. Empty choice: [] and blank grammar strings are rejected at ingress. Field names are snake_case (json_object, disable_any_whitespace, whitespace_pattern) to match vLLM’s wire format. (vLLM’s wire format also has structural_tag for Llama-style inline tool-call framing; arkor’s curated path is Gemma 4, so the SDK type omits it until broader base-model support lands.) |
signal | AbortSignal? | Aborts the local fetch. Does not stop work on the backend; the model finishes generating but you stop reading. |
Tool calling example
Structured-output example
Choosing between responseFormat modes
| Mode | Constraint enforced | Use when |
|---|---|---|
{ type: "text" } | None | Free-form text (the default behaviour). Useful as an explicit override when a parent function passes a value through. |
{ type: "json_object" } | Output parses as JSON | You want valid JSON but cannot pin the keys yet. The body parses, but property names, types, and required keys are not enforced. |
{ type: "json_schema", json_schema: { ..., strict: true } } | Full schema | Properties, types, and required keys are all enforced; properties not declared are rejected. Prefer this whenever you can write a schema, even a loose one. |
responseFormat is OpenAI-compatible and is the right knob for ~all “give me JSON” cases. Reach for structuredOutputs only for constraints responseFormat cannot express.
strict: true requires additionalProperties: false on every object schema. OpenAI’s strict mode is satisfied only when each type: "object" schema (the root, plus every nested object) explicitly sets additionalProperties: false and lists every property in required. Schemas that omit it are rejected by the backend with a 400 invalid_schema. The triage example above (and the cookbook recipe) follow this rule; copy them when you write your own schema.
structuredOutputs examples
The vLLM-specific extension. Exactly one of json / regex / choice / grammar / json_object per call: the TypeScript type rejects two at once. Don’t combine with a responseFormat constraint either: that would put two constraints into vLLM’s sampling params and the request is rejected at ingress.
Fixed choice list. Forces the response to one of a small enumerated set. Useful for classifier-style outputs where any prefix is a regression.
json_object: true. The structuredOutputs equivalent of responseFormat: { type: "json_object" }, present for parity with vLLM’s wire format. Only true is accepted; false is rejected at compile time (the type literal) and at ingress (vLLM only flips into JSON-object mode on a truthy value, so false would silently produce an unconstrained generation). Follows the same one-constraint-per-call rule as the others.
Tool calling round-trip
After the model emitstool_calls, run the tool yourself, append the result as a tool message, and call infer again. Pass the same tools and toolChoice so the second turn sees the same surface.
tool_calls[i].function.arguments is a JSON-encoded string, not a parsed object. JSON.parse it on receipt. tool_call_id on the tool message must match the id from the assistant’s prior tool_calls[i] so the model can attribute the result to the right call.
Type definitions
The supporting types are exported fromarkor. Inlined here for reference:
{ role: "assistant" } with neither content nor tool_calls does not type-check; at least one must be present. The [ToolCall, ...ToolCall[]] form encodes the non-empty tool_calls constraint at the type level.
Returns
infer returns Promise<Response>: the raw Fetch Response. The SDK does not parse the body; you decide how to consume it:
stream: true (the default), the body is an SSE event stream in the same shape Studio’s Playground consumes. The SDK does not currently expose a frame parser for this stream; if you need decoded text deltas, copy the small extractInferenceDelta helper from packages/studio-app/src/lib/api.ts or write a parser around eventsource-parser.
Response envelope (stream: false)
Non-streaming responses are an OpenAI-compatible chat-completion object:
- No constraint or
responseFormat: { type: "text" }:choices[0].message.contentis plain text. responseFormat: { type: "json_object" }ortype: "json_schema":choices[0].message.contentis a string containing the JSON. You callJSON.parseyourself; the SDK does not pre-parse.structuredOutputs: { json }or{ json_object: true }: same;choices[0].message.contentis a JSON string.JSON.parseit.structuredOutputs: { choice }/{ regex }/{ grammar }:choices[0].message.contentis a string matching the constraint. Not JSON; do not parse.toolsrequest that returned a tool call:choices[0].message.tool_callsis populated;contentis omitted ornull. Eachtool_calls[i].function.argumentsis itself a JSON-encoded string.
finish_reason: "tool_calls" is the signal the model wants to call a function rather than emit a final answer; loop with the tool calling round-trip.
Errors
infer does not hand you a non-OK Response. The SDK calls into CloudApiClient.chat, which throws a CloudApiError whenever the backend returns a non-2xx status. By the time control returns from await infer(...), you’ve either got a successful Response or an exception. Wrap each call in try / catch (or use .catch()) and branch on err instanceof CloudApiError to read err.status and err.message. The class is exported from arkor for that purpose.
| Status | When | What to do |
|---|---|---|
400 tool_calling_not_configured | tools set with implicit or explicit toolChoice: "auto", but the inference endpoint is not configured for auto-tool-extraction. | Enable auto-tool-extraction on the endpoint, or fall back to toolChoice: "required" / toolChoice: { type: "function", function: { name } } (these go through the guided-decoding path and do not need the parser). Retrying without changing config will keep failing. |
400 schema-validation error | responseFormat.json_schema.schema or a tools[i].function.parameters is not a valid JSON Schema; structuredOutputs was passed with zero or more than one constraint set; or structuredOutputs carries a constraint and responseFormat already supplies one (two constraints conflict at vLLM). | Fix the schema / pick exactly one constraint / drop the conflicting field. The TS type rejects multiple structuredOutputs keys at compile time; this status is mostly hit for runtime-built constraint objects or raw HTTP callers. |
4xx model rejection | Backend rejected the request (e.g. context length exceeded, unsupported message shape). | err.message carries the upstream message; surface it to the caller. |
5xx upstream | Inference cluster outage / cold start timeout. | The SDK does not retry inference requests automatically (the trainer’s SSE reconnect loop is for the job event stream, not for /v1/inference/chat). Roll your own retry around infer if you want one. |
onCheckpoint and are caught by the runtime’s reconnect loop, so unhandled errors can lead to silent retries. Always handle inference errors locally.
Constraints
inferlives only onCheckpointContext. There is no equivalent for completed jobs from the SDK side; for that path use the cloud-api directly or trigger the run again. Studio’s Playground is the UI-level route to chat with a completed adapter.- The call is scoped to
{ kind: "checkpoint", jobId, step }. You cannot retarget it to a different checkpoint or a different model from insideonCheckpoint. - The function is not memoized: every call hits the backend.
Use cases
- Sanity check during a run. Compare a checkpoint at step 50 to one at step 100 against a fixed prompt. If the loss curve looks fine but outputs are degraded, you find out before the run finishes.
- Custom early-stopping. Combine with a simple eval prompt: if outputs diverge, abort the run via
controller.abort()(seeabortSignal) and calltrainer.cancel()to stop the backend. See the Early stopping recipe for the full pattern. - Live preview into your own UI. Send the checkpoint output to Slack, an internal review queue, or your own app’s preview channel.
See also
onCheckpointfor the callback that hands youinfer- Mid-run eval recipe for an end-to-end pattern
- Early stopping recipe for tying
inferoutput toabortSignal+cancel() - Trainer control for
abortSignalandcancel()