Backoff supervisor
The framework’s default supervisor strategy restarts a child up to 10 times a minute. For transient failures, that can mean hammering a broken dependency — a broker that’s reconnecting, a DB that’s recovering — with restart-after-restart, each crashing identically.
BackoffSupervisor is the alternative. It wraps a single
child actor and reschedules its restart with an exponential
backoff (200 ms, 400, 800, …, clamped at a max), plus jitter so a
herd of clients doesn’t synchronize.
A minimal example
Section titled “A minimal example”import { ActorSystem, Props, Actor, BackoffSupervisor } from 'actor-ts';
class Flaky extends Actor<{ kind: 'do-it' }> { override preStart(): void { if (Math.random() < 0.7) throw new Error('upstream not ready'); } override onReceive(msg: { kind: 'do-it' }): void { this.log.info('ok'); }}
const system = ActorSystem.create('demo');
const supervisor = system.actorOf( BackoffSupervisor.props({ childProps: Props.create(() => new Flaky()), minBackoff: 200, maxBackoff: 10_000, randomFactor: 0.2, }), 'flaky-supervisor',);
// Send messages to the supervisor — they're forwarded to the// current child, or stashed during a backoff window.supervisor.tell({ kind: 'do-it' });The supervisor:
- Spawns a
Flakychild understoppingStrategy(so a crash = a clean stop, not a default Restart). - Death-watches the child.
- On
Terminated, schedules a one-shot timer to spawn a fresh child afterpolicy.delayFor(restartCount)ms. - Buffers messages arriving during the backoff window.
When the child eventually starts successfully and processes messages, the buffered messages get flushed to it (with original sender refs preserved for ask-style replies).
The mechanism
Section titled “The mechanism”Five steps, in execution order:
┌───────────────────────────────────────────────────────┐ │ BackoffSupervisor │ │ │ │ spawn child crash │ │ ─────────────► ───────► Terminated │ │ │ │ ┌─── schedule next spawn after │ │ │ backoff.delayFor(n) ms │ │ ▼ │ │ ─── messages buffered ─── (stash or drop) │ │ │ │ spawn child #2 → drain stash → ... │ └───────────────────────────────────────────────────────┘The framework names successive children child-1, child-2,
child-3, … so old terminations don’t collide with new spawns.
Configuration
Section titled “Configuration”The BackoffOptions<T> shape:
interface BackoffOptions<T> { childProps: Props<T>; childName?: string; minBackoff: number; maxBackoff: number; randomFactor?: number; // default 0.2 policy?: BackoffPolicy; resetCounter?: ResetCounter; // default 'after-min-stable' forward?: ForwardStrategy; // default 'stash' triggerOn?: TerminationTrigger; // default 'any' maxStashSize?: number; // default 1000 drainGraceMs?: number; // default min(50, minBackoff) forwardDuringGrace?: boolean; // default true clock?: () => number;}The most interesting fields:
triggerOn
Section titled “triggerOn”| Value | When to respawn |
|---|---|
'any' (default) | Respawn on every termination — both crashes and clean stops. |
'failure' | Respawn only on crashes. A clean context.stopSelf() means “this child is done”; the supervisor stops itself afterwards. |
'stop' | Respawn only on clean stops (e.g. a transient connection actor that periodically tears itself down). Crashes propagate up. |
'failure' is the right default if you’re modelling “restart on
unexpected death” — a clean self-stop is a deliberate choice the
supervisor should honor. 'any' matches Akka’s v1 behavior.
forward — what to do with messages while the child is dead
Section titled “forward — what to do with messages while the child is dead”forward: 'stash', // buffer up to maxStashSize, drain after respawn// orforward: 'drop', // discard silently (debug-logged)Stashing preserves sender refs so ask-replies continue to work after the respawn — a message asked while the child was down still gets its reply once the new child handles it.
Dropping is the right call for “transient pings that aren’t worth keeping” — telemetry, heartbeats, where stale messages are worse than lost ones.
resetCounter
Section titled “resetCounter”resetCounter: 'after-min-stable', // reset when child alive >= minBackoff (default)resetCounter: 'never', // never reset (counter grows monotonically)resetCounter: { kind: 'after-time', ms: 60_000 }, // reset after 60s aliveWithout resetting, a child that fails after a long-stable period
gets the same long backoff as after a recent crash — which is
usually wrong (the long-running success suggests the failure is
fresh). 'after-min-stable' resets the count when the child has
been alive for at least minBackoff, so a normal short backoff
restarts after a long-running success.
drainGraceMs + forwardDuringGrace
Section titled “drainGraceMs + forwardDuringGrace”After a respawn, the supervisor waits up to drainGraceMs (50 ms
default) before draining the stash to the new child. This
protects against children that crash in preStart:
- If the child dies during the grace window, the stash is held back for the next incarnation — stashed messages aren’t lost to dead-letters when the child keeps crashing on startup.
forwardDuringGrace: true (default) sends new messages
immediately during the grace; forwardDuringGrace: false stashes
them until grace expires. The default trades a tiny risk of
dead-lettering during a preStart-crash for lower latency on the
happy path.
Custom backoff policy
Section titled “Custom backoff policy”import { BackoffSupervisor, linearBackoff } from 'actor-ts';
BackoffSupervisor.props({ childProps: ..., minBackoff: 500, maxBackoff: 10_000, policy: linearBackoff({ minMs: 500, maxMs: 10_000, stepMs: 500 }),});Override the default exponential backoff with any
BackoffPolicy — linear,
fibonacci, custom. minBackoff / maxBackoff are still
required (they’re advisory caps; the framework uses them for the
resetCounter heuristic), but the policy controls the actual
delay computation.
When to reach for BackoffSupervisor
Section titled “When to reach for BackoffSupervisor”Three good fits:
- Broker connections (Kafka, NATS, AMQP) where a transient
broker outage means the actor
connect()fails for a few seconds before recovering. DefaultdefaultStrategywould restart aggressively; backoff smooths it out. - Database actors that hold a connection pool — when the DB hiccups, the actor crashes, and backoff buys time before re-establishing.
- Third-party API actors with rate-limit-aware retries — when a vendor returns 429, the actor crashes; backoff waits before re-trying.
When NOT to use it
Section titled “When NOT to use it”Compared to plain supervision
Section titled “Compared to plain supervision”OneForOneStrategy(decider, { maxRetries, withinTimeRangeMs })
caps restarts at N per window but doesn’t delay between
them — the framework restarts immediately after each crash.
BackoffSupervisor adds the delay-between-restarts piece plus a
message-buffering layer. The two are complementary:
- For non-transient bugs, plain supervision with a low
maxRetriesis fine (give up after a few attempts and let the failure escalate). - For transient infrastructure issues, backoff supervision is worth the extra moving parts.
You can combine them — wrap a BackoffSupervisor’s own
strategy with a OneForOneStrategy(..., { maxRetries: 10 }) to
say “back off between restarts, but give up entirely after 10
attempts.”
Where to next
Section titled “Where to next”- Backoff policy —
the
exponentialBackoff/linearBackoffprimitives that produce the policy value. - Supervision — the plain-supervision baseline this builds on.
- Circuit breaker — for backing off before a call fails (not after).
- Retry — per-call retry with similar backoff math, but outside the actor world.
The BackoffSupervisor
API reference covers all options.