Upgrade strategies
Two kinds of production upgrade:
| Kind | Pattern |
|---|---|
| Code-only upgrade | New binary, same schemas. Rolling deployment — old + new versions coexist briefly. |
| Schema-breaking upgrade | New shapes for events / state / messages. Migration first, then rolling deployment. |
Pick the kind, follow the pattern. Mixing them naively breaks production — old nodes can’t read new schemas or vice versa.
Code-only upgrades
Section titled “Code-only upgrades”The common case. Bug fixes, refactors, behavior tweaks without changing persisted-data shapes.
1. Build the new binary (tag v1.2.3).2. Deploy via rolling update.3. K8s replaces pods one at a time.4. Each pod: SIGTERM → coordinated-shutdown → drain → new pod spawns → cluster-rejoin.5. Done.The cluster’s gossip + sharding rebalance + coordinated-shutdown handle the choreography. Total downtime: zero (if configured right; see Kubernetes deployment).
Requirements:
- Replicas ≥ 2. Single-replica clusters can’t drain cleanly.
- Coordinated shutdown configured with sane phase timeouts.
- Health checks correctly gate readiness.
Schema-breaking upgrades
Section titled “Schema-breaking upgrades”Whenever the upgrade changes:
- Event shapes in a journal.
- State shapes in a durable-state store.
- Message shapes that nodes might send each other during the rolling window.
- Configuration keys that move between major versions.
The pattern: make the change additive, then upgrade.
Pattern — additive event shapes
Section titled “Pattern — additive event shapes”Old code wrote:
type DepositedV1 = { kind: 'deposited'; amount: number };New code wants:
type DepositedV2 = { kind: 'deposited'; amount: number; currency: string };Step 1: deploy intermediate code that accepts both shapes.
class Account extends PersistentActor<...> { override eventAdapter() { return new DefaultAdapter<DepositedV2>({ currentVersion: 2, defaults: { currency: 'USD' }, }); }}This step:
- Writes V2 events under the new shape.
- Reads V1 events with
currencydefaulted to USD. - Works in old + new clusters because old code reads its own shape and ignores envelope wrapping.
Roll this out via standard rolling deployment.
Step 2 (optional later) — drop the DefaultAdapter once
all old events are aged out or snapshotted. Usually keep it
indefinitely for safety.
See migration recipes for the per-pattern walkthrough.
Pattern — non-additive schema changes
Section titled “Pattern — non-additive schema changes”For renames, restructures, removed fields, the mechanics need more steps:
1. Deploy code that READS old + writes NEW. (`MigratingAdapter`)2. Roll out fully. All new events are now in the new shape.3. Deploy code that READS NEW only (no longer supports old). Drops the migrating step.4. Optional: bulk-migration to rewrite still-extant old events into the new shape if you'd like to drop adapter complexity.See MigratingAdapter for the implementation.
Inter-actor message changes
Section titled “Inter-actor message changes”// v1 message: { kind: 'request' }// v2 message: { kind: 'request', traceId: string }During a rolling deployment, old nodes might send v1 to new nodes (or vice versa). The new code must tolerate both versions of incoming messages.
Strategy:
- Add the new field as optional in the message type.
- New code can handle messages missing the field (default it).
- Deploy. Old → new sends without the field, works. New → old sends with the field, old ignores it.
Once everything’s on v2, the field can become required in a later deployment.
Configuration changes
Section titled “Configuration changes”# v1 → v2: renamed config keyactor-ts.cluster.gossip-interval = 1s # v1actor-ts.cluster.gossip-interval-ms = 1000 # v2 (renamed)The framework’s config system doesn’t auto-migrate renamed keys. Two strategies:
- Read both in the code that loads config; honor either name until you can require the new one.
- Run migration scripts that rewrite
application.confto the new key names.
Easier: avoid renaming config keys. When you must, deprecate the old name + warn at startup for one release before removing.
What if you can’t avoid downtime?
Section titled “What if you can’t avoid downtime?”Sometimes the schema-break is bad enough that an online migration is genuinely impossible — different storage backend, fundamental restructuring. Then plan downtime:
1. Announce maintenance window.2. Coordinated shutdown of the entire cluster.3. Run offline migration scripts (sometimes hours).4. Bring up the new version.5. Smoke-test before user traffic.Frequency: ideally never. But occasionally inevitable.
Rollback strategy
Section titled “Rollback strategy”Always have a rollback plan:
- Code-only: previous binary still works against same schemas. Roll back via K8s rolling-back.
- Schema-breaking: the previous code reads the new shapes (because step 1 was additive). Rollback is safe.
- Non-additive change: harder — the rollback step needs to also know the new shape. Avoid; if necessary, use feature flags to gate the new code path while keeping the old one reachable.
Where to next
Section titled “Where to next”- Operations overview — the production checklist.
- Rolling migration — the practical step-by-step recipe.
- Migration overview — schema evolution for events + state.
- Migration recipes — the cookbook.
- Coordinated shutdown — the graceful-stop machinery rolling deploys rely on.