Weakly-up
In a healthy cluster, a joining node transitions from joining to
up over a few gossip rounds — once the leader sees it. But when
the cluster is partitioned, the leader’s view doesn’t include
the partitioned side; a node joining the minority side waits
indefinitely.
Weakly-up is a transient state that breaks this deadlock:
after a configured delay, a joiner that hasn’t reached up is
auto-promoted to weakly-up. It’s gossip-visible to its
partition; the cluster can route to it without the leader’s
involvement — but with some restrictions.
joining ──── (delay elapses, leader still hasn't confirmed) ────► weakly-up │ │ └─── (leader visible, gossip converges) ─────────────────────► up │ weakly-up ──────┘ (eventually also confirmed)When this matters
Section titled “When this matters”In normal operation, the transition joining → up happens within
a second or two — you never see weakly-up. It comes up only
during:
- Cold-start with a partition — multiple nodes booting simultaneously across a partial network.
- A leader-side outage during join — the leader is unreachable but the joiner can reach other members.
- Stretched clusters with high RTT — the gossip-to-leader round trip is slow enough to exceed a configured threshold.
Without weakly-up, none of these scenarios make progress; the
joiner is stuck in joining forever (or until the leader
appears).
Enabling weakly-up
Section titled “Enabling weakly-up”await Cluster.join(system, { host, port, seeds, weaklyUpAfterMs: 3_000, // auto-promote after 3s in joining});The default is 0 (disabled). Pick a value high enough that
normal joining → up is the common path, but low enough that a
stalled join progresses within reasonable time.
3-10 seconds is typical. Less and you’d promote during routine slow gossip rounds; more and the stalled-join recovery is sluggish.
What weakly-up members can and can’t do
Section titled “What weakly-up members can and can’t do”| Capability | weakly-up |
|---|---|
Receive tell from other peers in the same partition | ✓ |
| Subscribe to cluster events | ✓ |
| Be a routee in cluster-router pools | ✗ |
| Host sharding entities | ✗ |
| Win a singleton election | ✗ |
The split: passive participation works, active
responsibilities don’t. A weakly-up member can still serve
HTTP requests landing on it, but cluster-managed responsibilities
wait for full up confirmation.
This is conservative on purpose — a weakly-up member might actually be on the minority side of a partition (it’s just not confirmed either way yet). Letting it host a singleton would risk dual-leadership.
The full state path
Section titled “The full state path”fresh node joining weakly-up up leaving exiting removed (init) │ │ │ │ │ │ └────┬───────┘ │ └────┬────┘ │ │ │ │ │ └──────────────────┘ └──────────────┘ (most common) (graceful exit)weakly-up is transient — once the leader becomes reachable
and gossip converges, the member transitions to up. It can also
go straight from weakly-up to leaving or removed if it’s
stopped without ever reaching up.
Observing the transition
Section titled “Observing the transition”import { MemberWeaklyUp, MemberUp } from 'actor-ts';
cluster.subscribe(MemberWeaklyUp, (evt) => { console.log(`${evt.member.address} promoted to weakly-up`);});
cluster.subscribe(MemberUp, (evt) => { console.log(`${evt.member.address} reached full up`);});In dashboards or monitoring, count MemberWeaklyUp events — a
non-zero rate in steady state means partitions are happening (or
the threshold is set too low for your network).
When to enable, when not
Section titled “When to enable, when not”Enable when:
- Cold-start partition tolerance matters (multi-AZ deployments, CI multi-node tests with imperfect networking).
- Your application has work that doesn’t require full cluster membership (e.g., an HTTP API that can serve cached reads even before the cluster fully forms).
Don’t enable when:
- Your app’s correctness depends on “every node in the cluster agrees on membership before doing anything.” Stay strict; let joins wait for full convergence.
- The cluster is small + stable, and you’d rather see “join stuck” alerts than silent half-membership.
Pitfalls
Section titled “Pitfalls”Where to next
Section titled “Where to next”- Cluster overview — the full membership state machine.
- Joining and seeds — what happens before weakly-up.
- Failure detector —
the threshold to keep
weaklyUpAfterMsabove. - Downing strategies — the complementary mechanism for partition recovery.