Cluster overview
A cluster is a group of ActorSystems — typically one per node —
that know about each other. Nodes gossip their membership state,
detect each other’s failures, and route messages across the network.
Once joined, code on any node can tell an actor on any other node;
the cluster transport hides the wire.
Two ways to think about it:
- From the application’s perspective: one logical actor system
spread across N nodes. An
ActorRefcan point at an actor on any node; the same code that worked single-node still works. - From the runtime’s perspective: N independent systems exchanging gossip + heartbeats, each tracking who’s alive, routing messages through a chosen transport.
A minimal example
Section titled “A minimal example”import { ActorSystem, Cluster } from 'actor-ts';
const system = ActorSystem.create('my-app');
const cluster = await Cluster.join(system, { host: '0.0.0.0', port: 2552, seeds: ['node-a:2552', 'node-b:2552'], roles: ['compute'],});
// Once joined:cluster.state; // → ClusterStatecluster.upMembers(); // → ReadonlyArray<Member> of up-status nodescluster.subscribe(MemberUp, ...); // listen for membership eventsThree settings are doing most of the work:
| Setting | Purpose |
|---|---|
host + port | This node’s external address. Peers contact it here. |
seeds | Other nodes’ addresses. At startup, this node contacts them to join the existing cluster. An empty list = “I’m the first one” (auto-promotes to leader). |
roles | Tags this node carries. Routers and shard regions can filter on these (role: 'compute' skips nodes without the tag). |
After Cluster.join resolves, the node is in the cluster
(though possibly still in joining state for a few seconds until
convergence).
The membership states
Section titled “The membership states”Every member goes through a small state machine:
joining— just announced itself. Other peers know about it but it isn’t routable yet.weakly-up(optional) — gossip-visible to peers but the leader hasn’t confirmed it. Useful for partition-tolerant joins; see Weakly-up.up— fully in the cluster, routable. This is the steady-state status.unreachable— the failure detector flagged this peer as not heartbeating. Still officially a member, but routing avoids it. Transient; flips back toreachableif heartbeats resume.leaving/exiting— the member is gracefully leaving (viacluster.leave()).removed— formally evicted. Kept as a tombstone for a TTL (default 24 h) so stale gossip from a slow peer can’t accidentally resurrect it.
The cluster’s events stream surfaces every transition — subscribe
to MemberUp / MemberRemoved / UnreachableMember etc. and
react.
What gossip does
Section titled “What gossip does”Gossip is how members agree on who’s in the cluster. Every
gossipIntervalMs (default 500 ms), each member picks a random
reachable peer and exchanges its view of the cluster. Over a few
rounds, every peer converges on the same state — without any
central coordinator.
The protocol carries:
- Membership table — every member’s address, status, roles, and version vector.
- Reachability observations — “I haven’t heard from X recently.”
Two peers exchanging gossip merge their tables by picking the higher version for each member. This is what makes convergence work without a leader-elected coordinator: every concurrent update is eventually seen by every peer.
For the deep dive, see Joining and seeds.
What the failure detector does
Section titled “What the failure detector does”The framework uses a phi accrual failure detector. Each member maintains a per-peer heartbeat history; if a peer’s heartbeats stop arriving within a statistical window, the detector flags it as suspect.
unreachableAfterMs— once suspicion exceeds the threshold for this long, the member is markedunreachable.downAfterMs— if the suspicion persists this long, the member is downed (split-brain resolution).
Adaptive — the detector accommodates variable network latency, not just a fixed timeout. See Failure detector for the phi math.
Split-brain and downing
Section titled “Split-brain and downing”When the network partitions, two halves of the cluster may both remain operational but lose contact with each other. Without intervention, both sides keep running and accept conflicting writes — the classic split-brain problem.
actor-ts ships several downing strategies:
| Strategy | What it does |
|---|---|
KeepMajority | The side with more nodes wins; the smaller side downs itself. |
KeepOldest | The side containing the oldest member wins. |
KeepReferee | A designated referee node’s view wins. |
KeepQuorum | Requires a configured quorum size; failures below it down the cluster. |
See Downing strategies for the full set. Pick deliberately — the default (no downing strategy) requires manual intervention during a partition.
Cross-node messaging
Section titled “Cross-node messaging”Once two nodes share a cluster, ref.tell(msg) to a foreign-node
ref just works:
const remote = await system.actorSelection( 'actor-ts://my-app@10.0.0.5:2552/user/api/sessions/user-42',).resolveOne();
remote.tell({ kind: 'whatever' });// → serialized, sent over the transport, delivered to the foreign actor's mailboxThe cluster transport serializes the message (JSON by default), includes the routing path, and the destination’s transport delivers it. Reply-to refs serialize cleanly — the receiving node re-attaches a remote-routable handle so replies flow back over the same transport.
See Refs across nodes for the wire-format details.
What sits on top
Section titled “What sits on top”The cluster module is the foundation; everything interesting about distributed actor systems comes from the extensions that build on it:
| Extension | What it adds |
|---|---|
| Sharding | One actor per “entity key,” distributed across nodes, with automatic rebalancing on membership changes. |
| Singleton | One actor cluster-wide. Re-spawned elsewhere if the host node leaves. |
| DistributedPubSub | Topic-based fan-out across the cluster. |
| DistributedData | CRDT-based shared state with eventual consistency. |
| Cluster router | Routes messages across cluster-up-members at a well-known path. |
| Receptionist | Service registry — actors register, others look up by key. |
You don’t enable these by default; you reach for them when needed. This page covers the foundation they all share — once you understand membership, gossip, and failure detection, the extensions follow.
Single-node mode
Section titled “Single-node mode”A “cluster” of one node is valid. Calling Cluster.join with
no seeds (or seeds that are unreachable) gives you a singleton
cluster — the local node auto-promotes to leader, becomes up,
and every extension that depends on cluster (sharding, singleton,
pubsub) works as if it were in a larger cluster.
This means cluster code can be developed and tested with a single node; you don’t need a Docker Compose setup to get started. Add more nodes later by giving them seeds pointing at the first.
When to use the cluster
Section titled “When to use the cluster”Three primary motivations:
- Scale-out: more actors than one node’s memory or CPU can handle. Sharding distributes them.
- Fault tolerance: if one node dies, the work moves to another. Singleton and sharding handle the failover.
- Geographic distribution: actors near their users or data, coordinating with the rest of the cluster over the WAN.
For a single-process app, you don’t need the cluster module. For a multi-process app where each process holds independent state, you don’t need it either — just use process boundaries. Reach for it when you genuinely need shared logical state across multiple machines.
Where to next
Section titled “Where to next”- Joining and seeds — the join protocol, seed-node configuration, what happens if all seeds are unreachable.
- Sharding overview — one-actor-per-entity-key with automatic rebalancing.
- Singleton overview — exactly-one cluster-wide.
- Distributed data overview — shared state via CRDTs.
- Failure detector — the phi-accrual mechanism.
- Downing strategies — resolving split-brain.
The Cluster API reference covers
the join/leave/subscribe surface.