Sharding with lease
The sharding coordinator runs as a cluster singleton — only the leader’s coordinator is active. With a downing strategy + healthy network, that’s enough.
During a network partition + buggy downing config, both halves might briefly think they’re leader → two coordinators → conflicting shard allocations → entities possibly spawned on both sides.
The single-writer lease prevents this:
import { ClusterSharding, KubernetesLease } from 'actor-ts';
const sharding = ClusterSharding.get(system, cluster).start<Cmd>({ typeName: 'order', entityProps: Props.create(() => new OrderEntity()), extractEntityId: (msg) => msg.id, lease: new KubernetesLease({ name: 'order-sharding-coordinator', owner: process.env.POD_NAME!, ttlMs: 30_000, namespace: process.env.K8S_NAMESPACE!, }),});Now even if two nodes both think they’re leader, only one holds the lease. Only the lease-holder’s coordinator processes shard allocations.
How it works
Section titled “How it works”Two nodes A + B both think they're cluster leader (partition + insufficient downing config). │ │ both attempt lease.acquire() ▼ Lease backend (K8s Lease) — atomic CAS │ ├── A's acquire succeeds → A is coordinator └── B's acquire fails → B's coordinator stays passive │ │ When A loses lease (TTL expiry, crash): │ │ B's renewing acquire fires → B becomes coordinator ▼ Coordinator transitions cleanly; allocations resume.The lease backend (K8s API server, etcd) provides the atomic exactly-one-holder guarantee — it’s the source of truth beyond gossip.
Configuration
Section titled “Configuration”interface StartSettings<TMsg> { // ... base sharding settings ... lease?: Lease; acquireRetryIntervalMs?: number; // default 5000}| Field | Purpose |
|---|---|
lease | The Lease instance — typically KubernetesLease. |
acquireRetryIntervalMs | Retry cadence on failed acquire. |
Same Lease abstraction as
singleton with lease —
see Coordination for the
interface.
What’s protected
Section titled “What’s protected”The lease gates coordinator state writes:
- Shard allocations — assigning shards to regions.
- Rebalance decisions — moving shards between regions.
- Handoff coordination — orchestrating shard handoffs.
The lease doesn’t gate:
- Per-region entity hosting (regions are tied to actual cluster members, not the coordinator).
- Entity messaging (messages route via the coordinator’s last-known allocation; the lease isn’t in the message path).
So the worst-case during a partition is slightly stale shard assignments — entities continue running, just no new allocations until lease ownership stabilizes.
When to enable
Section titled “When to enable”Three good fits:
- Production multi-region clusters where partitions are plausible.
- Financial / inventory entities where dual-allocation would cause real damage.
- Compliance requiring “no possibility of split-brain in any single-tenant production system.”
For typical single-region K8s deployments with a downing strategy, the lease is paranoid-safe — adds operational complexity for protection against rare edge cases.
Operational considerations
Section titled “Operational considerations”Lease backend availability
Section titled “Lease backend availability”K8s API server outage → no replica can acquire → no shard allocationsThe lease backend becomes a SPOF. For most setups, K8s API availability is much higher than the cluster itself — but plan for the rare case.
Failover latency
Section titled “Failover latency”Old coordinator loses lease (TTL expiry: 30s) → new coordinator acquires (typically sub-second after TTL) → new coordinator rebuilds state from gossip + journalThe failover window is the lease TTL — ~30 s typical. During that window:
- No new shard allocations happen.
- Existing entities continue receiving messages.
- New entity IDs that need allocation queue up; processed after failover.
Acceptable for most workloads.
Combining with regular downing
Section titled “Combining with regular downing”{ downingProvider: new KeepMajority(), // + the sharding lease lease: ...,}Both layers active. Downing handles normal cluster convergence; the lease guarantees the coordinator-uniqueness invariant.
Reading the protection level
Section titled “Reading the protection level”| Setup | Coordinator-uniqueness guarantee |
|---|---|
| No downing, no lease | Best-effort. Partitions cause dual coordinators. |
| Downing strategy only | Strong on stable networks. |
| Downing + lease | Paranoid-safe. Both invariants enforced. |
For singleton + sharding production setups in critical-data scenarios, both is the recommended pattern.
Where to next
Section titled “Where to next”- Sharding overview — the foundation.
- Singleton with lease — the same pattern for singletons.
- Coordination overview — the lease abstraction.
- KubernetesLease — the production backend.
- Downing strategies — the complementary partition resolver.