Coordinated shutdown
system.terminate() stops the actor system, but a production app
usually has work to do before that: drain in-flight HTTP requests,
tell the cluster you’re leaving, flush a journal, close broker
connections. And those steps have an order — leave the cluster
before you stop the sharding region; stop the HTTP server
before the actors that handle requests.
Coordinated shutdown is the DSL for that. You register tasks
against named phases; the framework runs them in dependency
order, one phase after the next, each with a timeout cap. Calling
run() from any trigger (SIGTERM, K8s PreStop hook, an admin
endpoint) executes the whole pipeline once.
The minimal example
Section titled “The minimal example”import { ActorSystem, CoordinatedShutdownId, Phases, type Reason,} from 'actor-ts';
const system = ActorSystem.create('my-app');const cs = system.extension(CoordinatedShutdownId);
cs.addTask(Phases.ServiceUnbind, 'close-http', async (reason) => { await httpServer.close();});
cs.addTask(Phases.ServiceRequestsDone, 'drain-in-flight', async () => { await waitForInFlightRequests(/* up to 10s */);});
cs.installProcessHooks(); // SIGTERM/SIGINT → cs.run(ProcessTerminateReason)Three things happen when SIGTERM lands:
- The runtime calls
cs.run(new ProcessTerminateReason('SIGTERM')). - The phases run in canonical order. Inside each phase, all registered tasks run in parallel; the phase waits for them all (or for their timeouts).
- The pipeline finishes with the built-in
actor-system-terminatetask, which callssystem.terminate()for you.
Everything in between — HTTP unbinding, cluster leave, journal flush — is whatever you added.
The 12 canonical phases
Section titled “The 12 canonical phases”Listed in execution order:
| # | Phase name | Typical tasks |
|---|---|---|
| 1 | before-service-unbind | Last-chance announcements before the server stops accepting connections. |
| 2 | service-unbind | Stop the HTTP server / gRPC server / WebSocket listener from accepting new connections. |
| 3 | service-requests-done | Wait for in-flight requests to finish; abort the rest. |
| 4 | service-stop | Close client connections, release sockets. |
| 5 | before-cluster-shutdown | Optional pre-cluster-leave hooks. |
| 6 | cluster-sharding-shutdown-region | Tell the sharding region to hand off entities. |
| 7 | cluster-leave | Issue a Cluster.leave() — gossip leaving status. |
| 8 | cluster-exiting | Wait for the cluster to acknowledge the leave. |
| 9 | cluster-exiting-done | Confirm cluster transition is complete. |
| 10 | cluster-shutdown | Tear down cluster transports. |
| 11 | before-actor-system-terminate | Last-chance app-level cleanup (flush journals, close brokers). |
| 12 | actor-system-terminate | The built-in system.terminate() task. |
You don’t have to use every phase. Empty phases are no-ops; only phases with registered tasks do anything. In a single-node app without cluster, only phases 1-4 and 11-12 see tasks.
The Phases constant exports the canonical names — prefer it over
string literals for autocomplete:
import { Phases } from 'actor-ts';
cs.addTask(Phases.ServiceUnbind, ...); // ✓ typedcs.addTask('service-unbind', ...); // ✗ stringly-typed, no auto-completeAdding custom phases
Section titled “Adding custom phases”For app-specific work that doesn’t fit a canonical phase, declare your own:
cs.addPhase({ name: 'flush-metrics', timeoutMs: 3_000, dependsOn: [Phases.BeforeActorSystemTerminate], recover: true,});
cs.addTask('flush-metrics', 'push-prometheus', async () => { await metricsRegistry.flush();});The dependsOn field is what makes the order DAG-shaped rather
than linear — your phase runs after before-actor-system-terminate
but before actor-system-terminate (because the latter has the
former in its own implicit chain).
The framework does a topological sort, so cycles fail loud at
registration time (Error: cycle in phase dependencies).
Task semantics
Section titled “Task semantics”Every task is a function from a Reason to void | Promise<void>:
type ShutdownTask = (reason: Reason) => Promise<void> | void;The reason lets a task branch on why shutdown was triggered:
cs.addTask(Phases.ClusterLeave, 'gossip-leave', async (reason) => { if (reason instanceof ClusterDowningReason) { // We were downed — don't bother gossiping a leave. return; } await cluster.leave();});Built-in Reason classes:
| Class | When |
|---|---|
ProcessTerminateReason(signal) | SIGTERM/SIGINT via installProcessHooks. |
ActorSystemTerminateReason | User called system.terminate() directly. |
ClusterLeavingReason | Cluster initiated a graceful leave. |
ClusterDowningReason | Cluster forced this node out. |
UnknownReason | Trigger not specified. |
You can subclass Reason for app-specific triggers
(AdminEndpointReason, HotReloadReason, etc.).
Parallelism within a phase
Section titled “Parallelism within a phase”All tasks in a phase run concurrently — they’re started together
and the phase waits for the last one (or its timeout). If you have
ordering requirements within a phase (task B must wait for task
A), put them in different phases with a dependsOn.
Timeouts
Section titled “Timeouts”Each phase has a timeoutMs (default 5 s); each task is wrapped in
a timeout race. A task that doesn’t finish in time is logged and
either:
- Recovered from (the phase continues,
recover: true— the default). The next phase starts. - Halts the pipeline (
recover: false). Subsequent phases are not run; shutdown stops mid-flight.
Override per phase:
cs.setPhaseTimeout(Phases.ServiceRequestsDone, 30_000); // 30s drain budgetOr define your own with the wanted timeoutMs / recover:
cs.addPhase({ name: 'aggressive-cleanup', timeoutMs: 1_000, // strict cap dependsOn: [Phases.BeforeActorSystemTerminate], recover: false, // failure → halt});SIGTERM / SIGINT hooks
Section titled “SIGTERM / SIGINT hooks”cs.installProcessHooks();// Or: cs.installProcessHooks(['SIGTERM', 'SIGINT', 'SIGUSR2']);This attaches handlers that call cs.run(new ProcessTerminateReason(signal)).
Calling twice is harmless (idempotent). Tests usually skip the
hooks; production always wires them up.
removeProcessHooks() undoes them — useful for tests that
instantiate a system, run, and tear down inside a single process.
K8s PreStop integration
Section titled “K8s PreStop integration”In Kubernetes, the pod-shutdown sequence is:
1. K8s sends SIGTERM and ends the pod's grace period clock.2. K8s also calls the PreStop hook (if configured), running concurrently.3. After max(graceful-shutdown, grace-period), K8s sends SIGKILL.The standard recipe:
// On SIGTERM, run coordinated shutdown:cs.installProcessHooks();
// PreStop hook script (in your container image):// #!/bin/sh// sleep 10 # give upstream LBs time to drain this pod// exit 0The sleep in PreStop gives the load balancer time to drop this
pod from rotation before the actor system starts shutting down
— so in-flight HTTP requests don’t see “I’m draining, go away.”
See Operations — Kubernetes for the full deployment manifest.
Multi-trigger safety
Section titled “Multi-trigger safety”cs.run() is idempotent — calling it multiple times returns the
same in-flight promise. Three independent triggers (SIGTERM, an
admin endpoint, and a cluster downing) all calling run doesn’t
re-run the pipeline. The first call starts it; subsequent calls
await the same completion.
This matters because in production you often have multiple shutdown paths:
// SIGTERM path:cs.installProcessHooks();
// Admin-endpoint path:app.post('/shutdown', async (req, res) => { await cs.run(new AdminEndpointReason()); res.send('ok');});
// Cluster downing path is auto-wired by the cluster extension.All three end up running the same shutdown sequence once.
What runs after cs.run() completes
Section titled “What runs after cs.run() completes”By the time the promise resolves:
- Every task in every phase has either succeeded or timed out.
- The built-in
actor-system-terminatetask has calledsystem.terminate(), which has stopped every actor and closed the dispatcher and scheduler. - The process is free to exit (
process.exit(0)). Nothing left to do.
A common shell of a production main:
async function main() { const system = ActorSystem.create('my-app'); const cs = system.extension(CoordinatedShutdownId);
// Register tasks...
cs.installProcessHooks();
// Block until shutdown completes (e.g. via SIGTERM). await new Promise(() => {}); // never resolves; the hooks drive shutdown}
main().catch((err) => { console.error(err); process.exit(1);});When SIGTERM arrives, the hook fires cs.run(...), the pipeline
runs, the system terminates, and Node exits cleanly because there
are no more handles keeping the loop alive.
Common pitfalls
Section titled “Common pitfalls”Where to next
Section titled “Where to next”- Actor system —
the
terminate()that runs at the end of the pipeline. - Cluster overview — the
cluster phases (
cluster-leave,cluster-exiting, …) wire themselves up automatically when the cluster extension is active. - Kubernetes deployment — the full PreStop + SIGTERM + grace-period recipe.
- Persistence — Migration — rolling shutdown for journal migrations.
The CoordinatedShutdown
API reference covers addTask, addPhase, run, and the full
phase constant set.