Supervision
When an actor’s onReceive throws — synchronously or via a rejected
Promise — the failure doesn’t crash the process. It travels up to
the actor’s parent, which decides what to do via its supervisor
strategy. Four outcomes; you pick one per error class.
This is the “let it crash” philosophy actor-ts inherits from Erlang and Akka: handling every error inline at the call site is brittle. A supervisor a level up has a wider view — it knows whether the crashed actor is replaceable (restart it), holds critical state (escalate to a higher supervisor), or should give up entirely (stop and trigger compensation).
The four directives
Section titled “The four directives”When a child throws, the supervisor’s decider returns one of:
| Directive | What it does |
|---|---|
Restart | Throw the broken instance away. Build a fresh one from the same Props factory. Mailbox is retained; the new instance picks up the next message. |
Resume | Keep the actor’s state. Skip the failing message. Continue with the next one in the mailbox. |
Stop | Stop the actor permanently. Children are stopped first. Further messages go to dead letters. |
Escalate | Re-throw the error at the supervisor’s own parent. The supervisor itself usually then gets restarted. |
Restart is the default — the framework’s defaultStrategy returns
Restart for every error. Use the other three when restart isn’t
the right semantic for your domain.
A minimal example
Section titled “A minimal example”import { Actor, ActorSystem, Props, OneForOneStrategy, Directive } from 'actor-ts';
class Worker extends Actor<{ kind: 'do-it' } | { kind: 'fail' }> { override onReceive(msg: { kind: 'do-it' } | { kind: 'fail' }): void { if (msg.kind === 'fail') throw new Error('boom'); this.log.info('did the work'); }}
class Boss extends Actor<{ kind: 'spawn-worker' }> { // Custom supervisor strategy for this Boss's children: always restart, // but cap at 5 restarts per minute — beyond that, stop the child. override supervisorStrategy = new OneForOneStrategy( (err) => Directive.Restart, { maxRetries: 5, withinTimeRangeMs: 60_000 }, );
override onReceive(msg: { kind: 'spawn-worker' }): void { const worker = this.context.spawn(Props.create(() => new Worker())); worker.tell({ kind: 'do-it' }); worker.tell({ kind: 'fail' }); // <- throws worker.tell({ kind: 'do-it' }); // <- new instance, after restart }}The Boss’s supervisor strategy catches the Worker’s throw. The Boss
sees the failure, applies the Restart directive, and the Worker
processes the next message on a fresh instance. Three messages,
three log lines — the second one’s Error: boom shows up in the
Boss’s log, not as an uncaught exception.
One-for-one vs all-for-one
Section titled “One-for-one vs all-for-one”Two strategy scopes — they control whether the directive applies to just the failing child or to all of the parent’s children:
OneForOneStrategy — restart/stop/etc. just the failing child.
The siblings keep running. This is the default. Use when children
are independent of each other: one user-session crashing shouldn’t
affect the other sessions.
AllForOneStrategy — apply the directive to every child when
any one fails. Use when children share state or coordinate tightly
— a small cluster of actors that must restart together, e.g. a
producer + consumer pair that talk over an internal channel.
import { AllForOneStrategy, Directive } from 'actor-ts';
override supervisorStrategy = new AllForOneStrategy( () => Directive.Restart, { maxRetries: 3, withinTimeRangeMs: 30_000 },);The vast majority of actor-ts code uses OneForOneStrategy. Reach
for all-for-one only when you’ve explicitly decided the children’s
states are coupled.
Per-error deciders
Section titled “Per-error deciders”The decider receives the error and returns a directive — so you can have different responses per error class:
import { decideBy, Directive, OneForOneStrategy } from 'actor-ts';
class TransientNetworkError extends Error {}class CorruptedStateError extends Error {}class UnknownProblem extends Error {}
override supervisorStrategy = new OneForOneStrategy( decideBy( [ { match: TransientNetworkError, then: Directive.Resume }, // skip the bad message { match: CorruptedStateError, then: Directive.Restart }, // clean reboot { match: UnknownProblem, then: Directive.Escalate }, // ask grandparent ], Directive.Restart, // fallback when none matched ),);decideBy is a helper that builds a decider from a list of
{ errorClass, directive } mappings with a fallback. You can also
hand-write the decider as a plain function — (err: Error) => Directive
— if you need more sophisticated logic.
Restart semantics — what’s lost, what’s kept
Section titled “Restart semantics — what’s lost, what’s kept”When Restart fires:
- The framework calls
preRestart(reason, message?)on the about-to-be-thrown-away instance. Default: stops all children, callspostStop. Override to release resources held outside the actor (file handles, open sockets, broker connections). - The instance is dropped. All instance fields (
this.count,this.handle, …) are lost. - A new instance is built from the same
Props.createfactory. - The framework calls
postRestart(reason)on the fresh instance. Default: callspreStart. Override to re-acquire resources. - The mailbox is retained. The next message (the one after the failed one) is processed on the new instance.
The failed message itself is dropped by default — it stays out
of the mailbox. Override preRestart if you need different
semantics (e.g. push the failed message into a dead-letter queue
for inspection).
If you want state preserved across restart, the actor needs to
persist that state somewhere external — typically a journal via
PersistentActor, or a
shared DistributedData entry. Restart explicitly does NOT
preserve in-memory state; that’s the whole point — “let it crash”
trusts the recovery path more than the per-message guard.
Restart limits + the time window
Section titled “Restart limits + the time window”Every strategy has two numeric knobs that control “give up” behavior:
maxRetries— how many restart attempts the supervisor will tolerate before escalating.-1= unlimited.withinTimeRangeMs— a sliding time window in milliseconds for counting retries.0= no window (counts are never reset, somaxRetriesis a process-lifetime cap).
new OneForOneStrategy( () => Directive.Restart, { maxRetries: 10, withinTimeRangeMs: 60_000 }, // up to 10 restarts/minute);If a child restarts 11 times in one minute, the 11th failure escalates instead — the supervisor itself throws the error at its parent. This protects against infinite restart loops (a child that keeps crashing on a permanently-broken state).
For exponential-backoff retries, use the BackoffSupervisor pattern — it wraps a child with a backoff timer so successive restarts get progressively delayed, rather than instant + capped.
Built-in strategies
Section titled “Built-in strategies”The framework exports three ready-made strategies for common cases:
| Strategy | Behaviour |
|---|---|
defaultStrategy | Restart everything, cap 10/minute. The framework default if you don’t override. |
stoppingStrategy | Stop the failing child immediately, no restart. Useful when the parent’s job is to spawn replacements on demand. |
escalatingStrategy | Always escalate to the grandparent. The child gives up; the parent kicks the can up the chain. |
Use these for actors where the standard behaviour fits; build a
custom OneForOneStrategy or AllForOneStrategy otherwise.
The escalation chain
Section titled “The escalation chain”Escalation walks up the parent tree:
/user │ └── /boss <- escalates here when Worker raises beyond maxRetries │ └── /worker <- thrown hereIf /boss’s strategy returns Escalate, the error re-throws at
/user’s strategy. /user is the root user-guardian; if its
strategy escalates, the system enters a fatal-error state — usually
followed by system.terminate().
This means “uncaught errors” only happen if every level explicitly chooses to escalate, all the way to the root. Stops are a normal, expected outcome; escalation-to-root is the “we don’t know what to do” signal.
Top-level actors
Section titled “Top-level actors”Actors spawned via system.actorOf(...) have the root user
guardian as their parent. Its strategy is defaultStrategy —
restart on failure, capped at 10/minute. Override by passing a
strategy in Props:
const ref = system.actorOf( Props.create(() => new MyTopActor()) .withSupervisorStrategy(stoppingStrategy),);…or by giving the actor its own children-strategy. Two different
things: the strategy handling this actor’s failures (set on
Props) vs. the strategy this actor uses for ITS children
(set as override supervisorStrategy on the class).
Common pitfalls
Section titled “Common pitfalls”Where to next
Section titled “Where to next”- Actor — the base class whose
preRestart/postRestarthooks you override. - BackoffSupervisor — exponential-backoff variant for transient failures.
- Death watch — observing when an actor stops (vs catching when it throws).
- CircuitBreaker — when the failure is in a downstream call and you want to back off before the call fails.
- Coordinated shutdown — graceful-shutdown when the whole system needs to come down.
The Supervision module API reference documents
every directive, strategy class, and helper discussed here.