Skip to content

Upgrade strategies

Two kinds of production upgrade:

KindPattern
Code-only upgradeNew binary, same schemas. Rolling deployment — old + new versions coexist briefly.
Schema-breaking upgradeNew shapes for events / state / messages. Migration first, then rolling deployment.

Pick the kind, follow the pattern. Mixing them naively breaks production — old nodes can’t read new schemas or vice versa.

The common case. Bug fixes, refactors, behavior tweaks without changing persisted-data shapes.

1. Build the new binary (tag v1.2.3).
2. Deploy via rolling update.
3. K8s replaces pods one at a time.
4. Each pod: SIGTERM → coordinated-shutdown → drain → new pod
spawns → cluster-rejoin.
5. Done.

The cluster’s gossip + sharding rebalance + coordinated-shutdown handle the choreography. Total downtime: zero (if configured right; see Kubernetes deployment).

Requirements:

  • Replicas ≥ 2. Single-replica clusters can’t drain cleanly.
  • Coordinated shutdown configured with sane phase timeouts.
  • Health checks correctly gate readiness.

Whenever the upgrade changes:

  • Event shapes in a journal.
  • State shapes in a durable-state store.
  • Message shapes that nodes might send each other during the rolling window.
  • Configuration keys that move between major versions.

The pattern: make the change additive, then upgrade.

Old code wrote:

type DepositedV1 = { kind: 'deposited'; amount: number };

New code wants:

type DepositedV2 = { kind: 'deposited'; amount: number; currency: string };

Step 1: deploy intermediate code that accepts both shapes.

class Account extends PersistentActor<...> {
override eventAdapter() {
return new DefaultAdapter<DepositedV2>({
currentVersion: 2,
defaults: { currency: 'USD' },
});
}
}

This step:

  • Writes V2 events under the new shape.
  • Reads V1 events with currency defaulted to USD.
  • Works in old + new clusters because old code reads its own shape and ignores envelope wrapping.

Roll this out via standard rolling deployment.

Step 2 (optional later) — drop the DefaultAdapter once all old events are aged out or snapshotted. Usually keep it indefinitely for safety.

See migration recipes for the per-pattern walkthrough.

For renames, restructures, removed fields, the mechanics need more steps:

1. Deploy code that READS old + writes NEW. (`MigratingAdapter`)
2. Roll out fully. All new events are now in the new shape.
3. Deploy code that READS NEW only (no longer supports old).
Drops the migrating step.
4. Optional: bulk-migration to rewrite still-extant old events
into the new shape if you'd like to drop adapter complexity.

See MigratingAdapter for the implementation.

// v1 message: { kind: 'request' }
// v2 message: { kind: 'request', traceId: string }

During a rolling deployment, old nodes might send v1 to new nodes (or vice versa). The new code must tolerate both versions of incoming messages.

Strategy:

  1. Add the new field as optional in the message type.
  2. New code can handle messages missing the field (default it).
  3. Deploy. Old → new sends without the field, works. New → old sends with the field, old ignores it.

Once everything’s on v2, the field can become required in a later deployment.

# v1 → v2: renamed config key
actor-ts.cluster.gossip-interval = 1s # v1
actor-ts.cluster.gossip-interval-ms = 1000 # v2 (renamed)

The framework’s config system doesn’t auto-migrate renamed keys. Two strategies:

  • Read both in the code that loads config; honor either name until you can require the new one.
  • Run migration scripts that rewrite application.conf to the new key names.

Easier: avoid renaming config keys. When you must, deprecate the old name + warn at startup for one release before removing.

Sometimes the schema-break is bad enough that an online migration is genuinely impossible — different storage backend, fundamental restructuring. Then plan downtime:

1. Announce maintenance window.
2. Coordinated shutdown of the entire cluster.
3. Run offline migration scripts (sometimes hours).
4. Bring up the new version.
5. Smoke-test before user traffic.

Frequency: ideally never. But occasionally inevitable.

Always have a rollback plan:

  • Code-only: previous binary still works against same schemas. Roll back via K8s rolling-back.
  • Schema-breaking: the previous code reads the new shapes (because step 1 was additive). Rollback is safe.
  • Non-additive change: harder — the rollback step needs to also know the new shape. Avoid; if necessary, use feature flags to gate the new code path while keeping the old one reachable.