Rebalance
When the cluster’s membership changes — a node joins or leaves — shards need to move to keep the workload distributed. The coordinator drives the process: it picks shards to relocate, tells the source region to hand off, and waits for confirmation before re-allocating.
coordinator source region destination region │ │ │ ├── HandOff(shardId) ───────►│ │ │ │ │ │ (drain entities + buffer messages) │ │ │ │ │ ◄── HandOffComplete ───────┤ │ │ │ │ ├── allocate new home ──────────────────────────────►│ │ │ │ (region accepts shard, spawns entities │ │ as messages arrive) │This is intentionally conservative — buffered messages wait
for HandOffComplete before forwarding to the new owner, so
nothing races past a half-moved entity.
What triggers rebalance
Section titled “What triggers rebalance”Two paths:
- Membership-driven —
MemberUp(a new node),MemberRemoved(a leaving node), or any cluster transition that changes the candidate set. The coordinator re-runs the allocation strategy for every owned shard. - Strategy-driven — every
rebalanceIntervalMs(default 2s), the coordinator asks the allocation strategy for its rebalance recommendations.LeastShardAllocationStrategyreturns shards to drain off busy nodes;HashAllocationStrategyreturns shards whose hash-target moved.
Both feed into the same handoff protocol.
The handoff sequence
Section titled “The handoff sequence”When the coordinator decides shard X should move from node A to
node B:
HandOff(X)is sent to node A’s region.- Node A’s region marks shard
Xashanding-off. It still receives messages for entities inX, but buffers them instead of forwarding to entity actors. - Each entity in
Xis sent the framework’s stop signal. They runpostStop, persist any final state, and die. - Once all entities in
Xare gone, node A sendsHandOffComplete(X)to the coordinator. - The coordinator runs
allocate(X, ...)to pick a new home. Suppose it picks node B. - The coordinator publishes the new owner via gossip. Buffered messages on node A start forwarding to node B’s region.
- Node B’s region receives messages for entities in
Xand spawns entities on demand (just like first-time allocation).
The whole thing usually takes sub-second to a few seconds,
depending on how many entities are in the shard and how long their
postStop work takes.
Configuration
Section titled “Configuration”sharding.start({ // ... rebalanceIntervalMs: 2_000, // strategy-driven rebalance every 2s handOffTimeoutMs: 10_000, // give up + force-reallocate after 10s});| Knob | Default | What |
|---|---|---|
rebalanceIntervalMs | 2000 | How often the strategy is consulted for rebalance recommendations. |
handOffTimeoutMs | 10_000 | If the source region doesn’t send HandOffComplete within this window, the coordinator gives up waiting and force-reallocates. |
What buffered messages mean
Section titled “What buffered messages mean”While shard X is handing-off:
- Messages targeting entities in
Xarrive at node A. - Node A’s region buffers them — doesn’t forward to the (now-stopped) entity actors.
- Once handoff completes and
Xhas a new owner, buffered messages forward to the new region. - The new region spawns entities and processes the messages in order.
This means a message sent during handoff is delayed, not dropped. The cost: latency spikes during rebalance windows.
When entities have state
Section titled “When entities have state”For persistent entities (PersistentActor):
postStopon the source side finalizes any pending persist.- The new entity instance on the destination replays the journal on startup.
- The buffered message arrives at a fully-recovered entity.
For non-persistent entities, state is lost between incarnations — the rebalance is equivalent to a restart with the buffered message acting as the first command.
If state matters across rebalance, use persistence. This isn’t optional.
Force-reallocation
Section titled “Force-reallocation”If HandOffComplete doesn’t arrive within handOffTimeoutMs:
- The coordinator logs a warning (“handoff timed out”).
- It force-reallocates the shard to its new owner.
- Buffered messages forward as if handoff had completed normally.
- The old entities may still be alive on node A — they receive messages locally too, leading to split-brain entities until the orphans die.
This is rare but possible. Causes:
- A
postStopthat hangs (slow journal write, blocked external call). - A network partition during handoff.
Mitigation:
- Keep
postStopshort and non-blocking. - Tune
handOffTimeoutMsto your worst-case persist time + buffer. - For the split-brain risk, consider a lease on the sharding coordinator (see sharding with-lease).
Rebalance vs scaling
Section titled “Rebalance vs scaling”Node added → coordinator allocates some shards to itNode removed → coordinator re-homes its shardsNode ↑ in load → LeastShardAllocationStrategy drains shards offNode ↓ in load → no rebalance (only the busy direction triggers)Rebalance is work-shedding, not work-acquiring. An idle node
in LeastShardAllocationStrategy’s view receives new shards as
they’re allocated, but doesn’t get existing shards moved into it
just because it’s quiet.
For an “always-balanced” effect, restart the coordinator
periodically (which re-runs allocation from scratch) — or
configure rebalanceThreshold: 1 to react to any imbalance.
Where to next
Section titled “Where to next”- Sharding overview — the broader picture.
- Allocation strategy — what decides the new homes.
- Remember entities — affects what gets re-spawned after handoff.
- Sharding with lease — split-brain protection for the coordinator.