Skip to content

Supervision

When an actor’s onReceive throws — synchronously or via a rejected Promise — the failure doesn’t crash the process. It travels up to the actor’s parent, which decides what to do via its supervisor strategy. Four outcomes; you pick one per error class.

This is the “let it crash” philosophy actor-ts inherits from Erlang and Akka: handling every error inline at the call site is brittle. A supervisor a level up has a wider view — it knows whether the crashed actor is replaceable (restart it), holds critical state (escalate to a higher supervisor), or should give up entirely (stop and trigger compensation).

When a child throws, the supervisor’s decider returns one of:

DirectiveWhat it does
RestartThrow the broken instance away. Build a fresh one from the same Props factory. Mailbox is retained; the new instance picks up the next message.
ResumeKeep the actor’s state. Skip the failing message. Continue with the next one in the mailbox.
StopStop the actor permanently. Children are stopped first. Further messages go to dead letters.
EscalateRe-throw the error at the supervisor’s own parent. The supervisor itself usually then gets restarted.

Restart is the default — the framework’s defaultStrategy returns Restart for every error. Use the other three when restart isn’t the right semantic for your domain.

import { Actor, ActorSystem, Props, OneForOneStrategy, Directive } from 'actor-ts';
class Worker extends Actor<{ kind: 'do-it' } | { kind: 'fail' }> {
override onReceive(msg: { kind: 'do-it' } | { kind: 'fail' }): void {
if (msg.kind === 'fail') throw new Error('boom');
this.log.info('did the work');
}
}
class Boss extends Actor<{ kind: 'spawn-worker' }> {
// Custom supervisor strategy for this Boss's children: always restart,
// but cap at 5 restarts per minute — beyond that, stop the child.
override supervisorStrategy = new OneForOneStrategy(
(err) => Directive.Restart,
{ maxRetries: 5, withinTimeRangeMs: 60_000 },
);
override onReceive(msg: { kind: 'spawn-worker' }): void {
const worker = this.context.spawn(Props.create(() => new Worker()));
worker.tell({ kind: 'do-it' });
worker.tell({ kind: 'fail' }); // <- throws
worker.tell({ kind: 'do-it' }); // <- new instance, after restart
}
}

The Boss’s supervisor strategy catches the Worker’s throw. The Boss sees the failure, applies the Restart directive, and the Worker processes the next message on a fresh instance. Three messages, three log lines — the second one’s Error: boom shows up in the Boss’s log, not as an uncaught exception.

Two strategy scopes — they control whether the directive applies to just the failing child or to all of the parent’s children:

OneForOneStrategy — restart/stop/etc. just the failing child. The siblings keep running. This is the default. Use when children are independent of each other: one user-session crashing shouldn’t affect the other sessions.

AllForOneStrategy — apply the directive to every child when any one fails. Use when children share state or coordinate tightly — a small cluster of actors that must restart together, e.g. a producer + consumer pair that talk over an internal channel.

import { AllForOneStrategy, Directive } from 'actor-ts';
override supervisorStrategy = new AllForOneStrategy(
() => Directive.Restart,
{ maxRetries: 3, withinTimeRangeMs: 30_000 },
);

The vast majority of actor-ts code uses OneForOneStrategy. Reach for all-for-one only when you’ve explicitly decided the children’s states are coupled.

The decider receives the error and returns a directive — so you can have different responses per error class:

import { decideBy, Directive, OneForOneStrategy } from 'actor-ts';
class TransientNetworkError extends Error {}
class CorruptedStateError extends Error {}
class UnknownProblem extends Error {}
override supervisorStrategy = new OneForOneStrategy(
decideBy(
[
{ match: TransientNetworkError, then: Directive.Resume }, // skip the bad message
{ match: CorruptedStateError, then: Directive.Restart }, // clean reboot
{ match: UnknownProblem, then: Directive.Escalate }, // ask grandparent
],
Directive.Restart, // fallback when none matched
),
);

decideBy is a helper that builds a decider from a list of { errorClass, directive } mappings with a fallback. You can also hand-write the decider as a plain function — (err: Error) => Directive — if you need more sophisticated logic.

Restart semantics — what’s lost, what’s kept

Section titled “Restart semantics — what’s lost, what’s kept”

When Restart fires:

  1. The framework calls preRestart(reason, message?) on the about-to-be-thrown-away instance. Default: stops all children, calls postStop. Override to release resources held outside the actor (file handles, open sockets, broker connections).
  2. The instance is dropped. All instance fields (this.count, this.handle, …) are lost.
  3. A new instance is built from the same Props.create factory.
  4. The framework calls postRestart(reason) on the fresh instance. Default: calls preStart. Override to re-acquire resources.
  5. The mailbox is retained. The next message (the one after the failed one) is processed on the new instance.

The failed message itself is dropped by default — it stays out of the mailbox. Override preRestart if you need different semantics (e.g. push the failed message into a dead-letter queue for inspection).

If you want state preserved across restart, the actor needs to persist that state somewhere external — typically a journal via PersistentActor, or a shared DistributedData entry. Restart explicitly does NOT preserve in-memory state; that’s the whole point — “let it crash” trusts the recovery path more than the per-message guard.

Every strategy has two numeric knobs that control “give up” behavior:

  • maxRetries — how many restart attempts the supervisor will tolerate before escalating. -1 = unlimited.
  • withinTimeRangeMs — a sliding time window in milliseconds for counting retries. 0 = no window (counts are never reset, so maxRetries is a process-lifetime cap).
new OneForOneStrategy(
() => Directive.Restart,
{ maxRetries: 10, withinTimeRangeMs: 60_000 }, // up to 10 restarts/minute
);

If a child restarts 11 times in one minute, the 11th failure escalates instead — the supervisor itself throws the error at its parent. This protects against infinite restart loops (a child that keeps crashing on a permanently-broken state).

For exponential-backoff retries, use the BackoffSupervisor pattern — it wraps a child with a backoff timer so successive restarts get progressively delayed, rather than instant + capped.

The framework exports three ready-made strategies for common cases:

StrategyBehaviour
defaultStrategyRestart everything, cap 10/minute. The framework default if you don’t override.
stoppingStrategyStop the failing child immediately, no restart. Useful when the parent’s job is to spawn replacements on demand.
escalatingStrategyAlways escalate to the grandparent. The child gives up; the parent kicks the can up the chain.

Use these for actors where the standard behaviour fits; build a custom OneForOneStrategy or AllForOneStrategy otherwise.

Escalation walks up the parent tree:

/user
└── /boss <- escalates here when Worker raises beyond maxRetries
└── /worker <- thrown here

If /boss’s strategy returns Escalate, the error re-throws at /user’s strategy. /user is the root user-guardian; if its strategy escalates, the system enters a fatal-error state — usually followed by system.terminate().

This means “uncaught errors” only happen if every level explicitly chooses to escalate, all the way to the root. Stops are a normal, expected outcome; escalation-to-root is the “we don’t know what to do” signal.

Actors spawned via system.actorOf(...) have the root user guardian as their parent. Its strategy is defaultStrategy — restart on failure, capped at 10/minute. Override by passing a strategy in Props:

const ref = system.actorOf(
Props.create(() => new MyTopActor())
.withSupervisorStrategy(stoppingStrategy),
);

…or by giving the actor its own children-strategy. Two different things: the strategy handling this actor’s failures (set on Props) vs. the strategy this actor uses for ITS children (set as override supervisorStrategy on the class).

  • Actor — the base class whose preRestart / postRestart hooks you override.
  • BackoffSupervisor — exponential-backoff variant for transient failures.
  • Death watch — observing when an actor stops (vs catching when it throws).
  • CircuitBreaker — when the failure is in a downstream call and you want to back off before the call fails.
  • Coordinated shutdown — graceful-shutdown when the whole system needs to come down.

The Supervision module API reference documents every directive, strategy class, and helper discussed here.