Skip to content

Single-writer lease

Replicated event sourcing trades single-writer consistency for availability — multiple replicas can write concurrently, and the conflict resolver merges.

For some workloads, conflicts shouldn’t happen at all — they represent bugs or domain violations. But losing the multi-region availability would be a step back.

The single-writer lease is the middle ground:

At any moment, exactly ONE replica holds the lease.
The lease-holder writes events normally.
Other replicas read but don't write (until they acquire the lease).
If the lease-holder fails, another replica acquires it.

Effectively turns replicated ES into a failover-capable single-writer system with replicated-ES’s recovery semantics underneath.

import { ReplicatedEventSourcedActor, KubernetesLease } from 'actor-ts';
class Account extends ReplicatedEventSourcedActor<Cmd, Event, State> {
readonly persistenceId = `account-${this.userId}`;
readonly replicaId = process.env.REPLICA_ID!;
readonly conflictResolver = ...;
// Opt in to lease:
readonly lease: Lease = new KubernetesLease({
name: `account-${this.userId}-writer`,
owner: process.env.REPLICA_ID!,
ttlMs: 30_000,
namespace: 'default',
});
}

The actor:

  1. On preStart, attempts to acquire the lease.
  2. On success → becomes the writer.
  3. On failure → starts in read-only mode.
  4. On onLost → drops back to read-only; another replica eventually acquires.

When you want active-active failover but single-writer consistency:

  • Financial transactions — balance changes must serialize.
  • Stock / inventory — concurrent decrement could overshoot.
  • Workflow state machines — transitions can’t be concurrent.

Without the lease, you’d need a resolver that handles concurrent withdrawals — possible but error-prone. With the lease, conflicts simply don’t arise.

override async onCommand(state: State, cmd: Cmd): Promise<void> {
if (!this.lease.checkAlive()) {
// I'm not the writer — reject or forward
cmd.replyTo.tell({ kind: 'not-writer', currentWriter: ... });
return;
}
// I am the writer — proceed normally
this.persist(event, () => {});
}

The replica still:

  • Replays the journal (sees the writer’s events).
  • Maintains state (read-side queries work).
  • Reports state to readers.

But rejects writes — callers see “this replica isn’t the writer; ask elsewhere.”

For a client transparently routing writes, this is harsh. The common pattern is a proxy actor that watches lease ownership + routes writes to the current writer.

Writer A holds lease.
│ A crashes (or its lease TTL expires)
Lease becomes available.
│ Replicas B, C, D race to acquire.
│ Only one wins (atomic lease acquire).
New writer (say B) starts writing.
│ A recovers; sees lease is held; runs as read-only.
Stable.

Failover window: TTL of the lease (typically 15-30 s). Shorter TTL = faster failover but more renewal traffic.

class Account extends ReplicatedEventSourcedActor<...> {
readonly lease = ...;
readonly conflictResolver = ...; // ← still required
}

The resolver is still mandatory. Why?

  • During failover window, both the old + new writer might briefly write — the old one before it notices its lease is gone, the new one after acquiring. Resolver handles those rare concurrent events.
  • Network partition between the lease backend and a replica — the replica thinks it has the lease + writes, while another replica has actually acquired it. Resolver reconciles when partition heals.

The lease reduces conflict frequency to near-zero but doesn’t eliminate. Always have a resolver.

Same as cluster-singleton leases — see Coordination.

  • InMemoryLease — tests.
  • KubernetesLease — production on K8s.
  • Custom — implement Lease against your coordination backend (etcd, Consul).

Adding the lease:

  • Lease acquire — one network call to the lease backend (K8s Lease patch, etc.). Sub-second.
  • Renewal — every ttl / 3 (~10 s typically). Cheap.
  • Conflict frequency drops to near-zero — resolver runs rarely.

The lease itself doesn’t slow normal writes — they proceed locally without lease consultation per call. The check is lease.checkAlive() (local, sub-microsecond).

Plain replicated ES:

  • Multiple writers per replica.
  • Conflict-resolver runs on every concurrent write.
  • No coordination required; tolerates partitions.
  • “Eventually consistent.”

With the lease:

  • One writer at a time (cluster-wide).
  • Conflicts are rare (only during failover / partition).
  • Coordination via the lease backend.
  • “Strongly consistent except during failover.”

Pick by your consistency vs availability requirements.