Skip to content

Gossip cadence

Cluster membership propagates via gossip — every gossipIntervalMs, each member picks a random reachable peer and exchanges its membership view. After a few rounds, the cluster converges.

Cluster.join(system, {
host, port, seeds,
gossipIntervalMs: 1_000, // default
});

The default is 1 second. Most clusters never need to change it.

Lower (e.g. 250 ms)Default (1 s)Higher (e.g. 5 s)
Faster convergenceBalancedSlower convergence
More gossip messagesModest trafficLess traffic
Faster failover detectionStandardSlower failover

Concretely:

  • A 5-node cluster at 1 s gossip → typical convergence after a join: 2-3 seconds.
  • At 250 ms gossip: ~700 ms.
  • At 5 s: ~10-15 seconds.

Reduce below 1 second when:

  • Latency-sensitive failover — singleton or sharding patterns where leader-change should propagate in sub-second. 500 ms is reasonable; 250 ms is aggressive.
  • Small cluster (≤5 nodes) — gossip volume stays manageable.
  • Quiet network — no other constraints; faster is fine.

Increase above 1 second when:

  • Large cluster (20+ nodes) — gossip volume grows O(N²) in the worst case across the cluster. 5 s slows convergence proportionally but reduces network noise significantly.
  • Bandwidth-constrained network — cross-region or cross-WAN clusters where chatty gossip is wasteful.
  • Stable cluster — rarely changes; faster gossip doesn’t help.

Per-node gossip bandwidth roughly:

gossip_size × (1 / gossipIntervalMs) × peers_per_round

Per-cluster:

N × gossip_size × (1 / gossipIntervalMs)

Where:

  • gossip_size is ~100-500 bytes per member in the view. In a 10-node cluster, gossip messages run ~1-2 KB.
  • peers_per_round is 1 (one random peer per tick).
  • N = cluster size.

For a 50-node cluster at default 1 s gossip:

50 × ~5 KB × 1/s = 250 KB/s aggregate

Negligible on a LAN. On a high-latency WAN at 10 ms RTT, each gossip round adds 10 ms of in-flight time — converges slower than you’d hope.

Gossip cadence affects several other operations:

OperationHow gossip affects it
Convergence after joinDirect — slower gossip = slower up.
Failure-detector unreachable detectionIndirectly — gossip carries last-seen times.
Sharding rebalanceCoordinator decisions ride on gossip.
DistributedPubSub topic propagationTopic→node map gossips at the cluster rate.
Receptionist service registrySame — gossip carries registrations.

A slower gossipIntervalMs slows all of these. This is usually fine in stable clusters; problematic in fast-changing workloads.

Cluster sizeNetworkRecommended gossipIntervalMs
3-5 nodesLAN250-500 ms
5-15 nodesLAN1 s (default)
15-50 nodesLAN1-2 s
50+ nodesLAN2-5 s
Cross-regionWAN2-5 s
Latency-sensitiveLAN250-500 ms