Documentation Index
Fetch the complete documentation index at: https://docs.arkor.ai/llms.txt
Use this file to discover all available pages before exploring further.
Slack / Discord notifications
Training runs take long enough that nobody actually watches Studio the whole time. The terminalonCompleted and onFailed callbacks are perfect places to fan a status message out to wherever your team already lives.
This recipe uses Slack incoming webhooks; Discord, Microsoft Teams, and arbitrary HTTP endpoints work the same way. Anything you can fetch, you can notify.
The pattern
<!here> mention only fires on failure, so successful runs do not page anyone. Adjust the urgency to match how often your team’s training jobs actually fail.
Why the inner try / catch matters
If the webhook request throws (Slack outage, DNS hiccup, a non-2xx response that your code rethrows on), the callback rejects. The Arkor runtime catches that rejection and routes it through the SSE reconnect loop (SDK § Lifecycle callbacks). With maxReconnectAttempts at its default of unlimited, a flaky webhook can quietly retry forever, and Last-Event-ID advancing across the retry can swallow the original event.
Treat the webhook as a side effect, not as part of the run’s success criterion. Catch inside; log if you want to know.
Variations
Per-step progress pings. Combine withonLog to post a one-line progress message every N steps:
process.env.NOTIFY_PROGRESS === "1" if you only want it for important runs.
Mid-run sample sharing. Combine with the Mid-run evaluation recipe: post each checkpoint sample to a review channel so colleagues can react with reactions while the run continues.
capture(), a Datadog event, a database insert: the shape is the same. Put the side effect behind an async helper that swallows its own errors and call it from the lifecycle callbacks. The trainer file does not need any extra orchestration.
What to keep in mind
- Inner
try / catchis mandatory. Notifications are nice to have; an outage in your webhook should never silently retry your training event stream. - Keep secrets out of the trainer file. The example reads
SLACK_WEBHOOK_URLfromprocess.envso the webhook does not land ingit. Same idea for any token-based destination. - Remember
erroris astring.onFailed’serrorargument is the string the backend sent (SDK § Lifecycle callbacks), not anErrorinstance. Embed it directly; do not call.messageon it.