Documentation Index
Fetch the complete documentation index at: https://docs.arkor.ai/llms.txt
Use this file to discover all available pages before exploring further.
Programmatic runs (no CLI)
arkor dev and arkor start are convenient for iteration, but they are not the only way to run a trainer. The arkor package re-exports runTrainer, and Trainer itself has start / wait / cancel, so you can drive a run from any TypeScript code: a server route, a cron worker, a CI step.
This recipe shows the two shapes that come up first.
Shape 1: runTrainer (the same function arkor start runs)
runTrainer is the function arkor start invokes after building. With no argument it imports src/arkor/index.ts directly; arkor start first runs arkor build and then calls runTrainer on the bundled artifact at .arkor/build/index.mjs. Either way it picks the trainer from the loaded module (preferring arkor, then trainer, then default) and runs start() and wait() for you.
src/arkor/ and just need to trigger it from non-CLI code: a GitHub Action step, a build step, a one-off script. You inherit all the trainer’s callbacks and abortSignal wiring.
Node version note for .ts entries. arkor CLI’s bin auto re-execs Node with --experimental-strip-types when the running Node does not already enable TypeScript stripping. Programmatic callers do not get that. To run a script that calls runTrainer() against a .ts source entry, use Node 23+ (TypeScript stripping is on by default), or pass --experimental-strip-types on Node 22.6+. If you would rather not depend on the experimental flag, point runTrainer at the build output (.arkor/build/index.mjs) instead — the artifact is plain ESM and runs on any supported Node version without flags.
A subtle point that bites in CI: runTrainer() (and trainer.wait()) resolves whether the run ended completed or failed. The SSE stream simply terminates either way; only transport-level errors (abort, reconnect exhausted) reject the promise. A naive try / catch around runTrainer() would let a failed training job exit 0. To make CI fail on a failed run, drive the trainer directly so you can inspect the terminal status:
await runTrainer() directly only when you do not need to detect a failed run from the calling code (for example, when an onFailed callback in the trainer already routes the failure to your alerting).
Shape 2: Direct start() / wait() (full control)
When you want to keep the trainer reference around, manage cancellation explicitly, or run multiple trainers from one process, build the Trainer yourself and drive it directly.
start submits, wait runs the SSE event stream that drives your callbacks. Calling them yourself is what lets you keep references, log around them, or compose runs together.
Where this pattern fits
Next.js API route. Trigger a run on demand from your app, return thejobId, and let the frontend poll Studio (or your own status page) for progress.
createTrainer caches the started job, so a single trainer instance can only drive one run; calling start() on it a second time returns the original jobId. In a long-lived Next.js server process, that means the route has to build a fresh trainer per request. Expose a factory from your trainer module:
dryRun: true in the trainer to validate the trainer config end to end without burning a long GPU run:
process.env.ARKOR_SMOKE and flips dryRun: true when set; the run finishes in a couple of minutes and the CI job fails loudly if anything is wrong with the trainer’s config.
Multiple trainers from one process. createArkor accepts a single trainer, so multi-trainer projects are programmatic, not declarative. Run them in sequence or in parallel:
wait() calls will trigger their start() if needed.
What to keep in mind
runTrainerand directstart/waitshare the same lifecycle. Callbacks fire fromwait(). If you callstart()and skipwait(), no callbacks run, even though the backend keeps training.abortSignalandcancelare still separate. See Early stopping for the two-step pattern.- The auxiliary helpers in SDK § overview are exported for these workflows.
readCredentials,writeCredentials,ensureCredentials,requestAnonymousToken, and thestate.jsonhelpers are there for code that needs to bootstrap auth or routing without going through the CLI. runBuild/runStart/runDevare not exported. The CLI command runners live undercli/commands/and are intentionally CLI-private.runTraineris the only public entry to the same flowarkor startuses.