Rolling deployment migration
Это содержимое пока не доступно на вашем языке.
This guide describes how to deploy a schema-breaking or master-key-rotating change across a running cluster without downtime and without coordinated stops. Every node continues serving traffic while the new version rolls out instance-by- instance.
The framework’s rolling-deploy story is built on four shipped features:
writeVersiononmigratingAdapter(#7) — readers handle every registered version; writers emit a chosen one.MasterKeyRing(#8) —active+retired[]keys; every encrypted body’s manifest names the key version it used.wrapLegacyhelpers (#9) — one-shot rewrite of pre-envelope events into the current schema envelope.SchemaRegistry(#6) — multi-version codec + compatibility checks.
The pattern below composes all four into a four-phase deploy. It applies identically to SQLite, Cassandra, in-memory, and object-storage journals.
TL;DR — the four-phase deploy
Section titled “TL;DR — the four-phase deploy” PHASE 1 PHASE 2 PHASE 3 PHASE 4 code-first observation writer flip cleanup (optional) ─────────── ─────────── ─────────── ───────────────v2 readers ──┐ all instances v2 writers old upcasterv1 writers │ running v2 code enabled dropped; coexist │ optional one-shot └─────────── observe ──────────────► backfill via chain.register(v1, v2) verify v1 events still read migrateInMemoryJournal writeVersion: 1 writers still emit v1 / migrateSnapshotStore — keeps history current-shape cleanEach instance in the cluster moves through these phases in sequence; the cluster as a whole moves forward only when every instance is at the same phase or later.
Phase 1 — Code-first
Section titled “Phase 1 — Code-first”Ship the new code with the new schema registered but still write the old shape.
import { MigrationChain, migratingAdapter, PersistentActor,} from 'actor-ts';
// 1. Define both versions in the chain.const chain = MigrationChain .start<OrderEventV1>('OrderEvent', 1) .next<OrderEventV2>(2, (v1) => ({ ...v1, currency: v1.currency ?? 'USD', // upcast: default the new field }));
// 2. Pin the writeVersion at the OLD version for this phase.const adapter = migratingAdapter(chain, { writeVersion: 1 });
class Order extends PersistentActor<OrderCmd, OrderEvent, OrderState> { override eventAdapter() { return adapter; } // ...}Outcome after Phase 1:
- Newly-deployed instances read both v1 and v2 events correctly (the chain upcasts v1 → v2 on read).
- All instances still write v1, because
writeVersion: 1. - Existing journal contents stay v1 — no historical rewrite.
Why pin writeVersion? If a v1-only instance is still
running and reads a v2 event a v2 instance just wrote, the v1
reader has no upcaster to make sense of it — MigrationChain.read
throws MigrationError. By holding writes at v1 until all
readers can handle v2, you avoid this split.
Rollback at this stage is trivial — the v2 code is just doing v1’s job; there’s nothing new on disk to undo.
Phase 2 — Observation
Section titled “Phase 2 — Observation”Roll the new code out to every instance. No code or config change between Phase 1 and Phase 2 — it’s a deployment-completion checkpoint, not a deployment step.
What to verify:
- All instances report the same build version (
/health, cluster-management endpoints, or your own deploy-tracker). - v1 events are being read by the new code path
(
migration_chain_upcast_total{from="1",to="2"}metric, or a log-line spot-check onchain.read). - Writers are still emitting v1 (peek the journal — the most-
recent record’s manifest version should be
1).
Why wait? The instant any writer flips to v2, every reader
needs to be v2-capable. If even one v1-only instance is still up,
its MigrationChain will lack the v2 step and chain.read will
throw UnknownVersionError on the first v2 event it sees.
Don’t skip this phase even if your deploy pipeline is “fast” — a 15-second pipeline can still leave a single instance behind during a network blip, and Phase 3 makes that single instance non-functional.
Phase 3 — Writer flip
Section titled “Phase 3 — Writer flip”Redeploy (or hot-config-change) with writeVersion: 2.
// Drop the explicit writeVersion — it defaults to chain.currentVersion (2).const adapter = migratingAdapter(chain);Outcome after Phase 3:
- New writes use the v2 shape (
currencyset explicitly, etc.). - v1 events on disk are still readable via the chain’s upcaster.
- Mixed-version journal: oldest events are v1-shaped, newer events
are v2-shaped — both round-trip through the same
applyEventthanks to the upcast on read.
Rollback at this stage is non-trivial. Any v2 event written since the flip cannot be read by a v1-only reader. If you must roll back, you have two paths:
- Forward-fix — ship a v3 chain entry that handles v2. This is just another rolling deploy on the same pattern.
- Restore from snapshot — stop writes, restore the journal from the last pre-Phase-3 snapshot, redeploy v1 code. Data written during the v2 window is lost.
The forward-fix is always preferable to the restore.
Phase 4 — Cleanup (optional)
Section titled “Phase 4 — Cleanup (optional)”After Phase 3 has run long enough that no v1 events are likely to be read on the hot path, you can:
-
Backfill historical data into v2 shape, so future reads skip the upcast. Use the one-shot helpers:
import { migrateInMemoryJournal, migrateSnapshotStore } from 'actor-ts';// Rewrite every event for every persistenceId into v2 shape.await migrateInMemoryJournal(journal, (event) => chain.manifestFor(event, 2));// Same for snapshots.await migrateSnapshotStore(snapshots, pids, (state) => chain.manifestFor(state, 2));For SQL / Cassandra journals where there’s no in-place rewrite yet (tracked under #71 for bulk wrap-legacy), use
migrateBetween Journals(source, target, { eventTransform })(v0.8.0) to read from the old backend and write a fresh, enveloped copy to the new one — same pattern as a backend swap, with the transform hook doing the wrap inline:await migrateBetweenJournals(oldJournal, newJournal, {eventTransform: (pe) => ({...pe,event: wrapEventAsEnvelope(pe.event, chain.manifestFor),}),}); -
Drop the v1 step from the
MigrationChainonce the backfill is complete:const chain = MigrationChain.start<OrderEventV2>('OrderEvent', 2);All future events are now born v2 with no upcast overhead.
Phase 4 is optional. Many production systems live indefinitely with mixed-version journals and never run the backfill — the upcast cost is typically nanoseconds per event. Run Phase 4 only when the historical-version step turns into operational drag (extra code paths to maintain, mixed-version reads complicating debugging, etc.).
Master-key rotation — parallel story
Section titled “Master-key rotation — parallel story”Encryption keys rotate on the same four-phase shape, with one extra acknowledgement: the active key is always one specific entry; the retired array is what readers need to decrypt historical bodies.
Phase 1 — Code-first (key)
Section titled “Phase 1 — Code-first (key)”Ship the new key as active, keep the old as retired.
import { ObjectStoragePluginOptions, registerObjectStoragePlugins } from 'actor-ts';
const opts: ObjectStoragePluginOptions = { // ... backend, compression, etc. encryption: { keys: { active: { version: 2, key: NEW_32_BYTES }, retired: [{ version: 1, key: OLD_32_BYTES }], }, },};After this phase, new bodies’ manifests stamp keyVersion = 2;
existing bodies’ manifests still say keyVersion = 1 and
decrypt via the retired[] entry.
Phase 2 — Observation (key)
Section titled “Phase 2 — Observation (key)”Verify every instance has the new key ring. A reader missing the
retired entry for v1 cannot decrypt the historical blobs.
Phase 3 — Re-encryption sweep (optional)
Section titled “Phase 3 — Re-encryption sweep (optional)”Once you want to physically remove the old key from the system,
re-encrypt every historical body under the new key with the
reEncryptObjectStorage helper (v0.8.0):
import { reEncryptObjectStorage } from 'actor-ts';
const result = await reEncryptObjectStorage(backend, { keyPrefix: 'snapshots/', keyring: { active: { version: 2, key: NEW }, retired: [{ version: 1, key: OLD }], }, onProgress: (e) => process.stderr.write( `${e.idx}/${e.total} ${e.action} ${e.key}\n`),});console.log(`rewrote ${result.rewrote} of ${result.scanned} objects`);The sweep walks every object under keyPrefix, decrypts using
whichever retired/active version the body manifest references,
and re-encrypts under the active key. Bodies already at the
active version are skipped on the fast path — the sweep is
idempotent and safe to re-run after a partial failure.
If-Match is used internally so a concurrent writer can’t be
overwritten silently.
Durable resume tokens + completeness check
Section titled “Durable resume tokens + completeness check”For million-object buckets the naive sweep above has two weaknesses that bite hard in practice:
- A crash partway through forces the resumed run to re-list and re-GET every key from the start — every body is still re-checked for “already at active version?”, which costs one HEAD/GET per object even though no PUT happens. For a 24-hour sweep, that’s a 24-hour wasted re-walk.
- If the operator dropped a retired key from the config too soon (e.g. only kept the most recent retired entry but the corpus also references an older one), the sweep gets half-way through and then aborts on the first body that uses the missing version. Half the corpus is now under the new key, half is still under the old.
Both are addressed by two opt-in options on
reEncryptObjectStorage (v0.10.0, #109):
import { reEncryptObjectStorage, InMemoryReEncryptProgressStore, type ReEncryptProgressStore,} from 'actor-ts';
// File-backed progress store — survives a process crash.// (The shipped `InMemoryReEncryptProgressStore` is the// in-process default for tests; for long-running production// sweeps you want a durable backing store. Same shape, plug// any of: file on disk, single Redis key, or a sentinel// object in the same bucket under a different prefix.)const progress: ReEncryptProgressStore = makeFileBackedStore('/var/lib/actor-ts/sweep.json');
const result = await reEncryptObjectStorage(backend, { keyPrefix: 'snapshots/', keyring: { active: { version: 3, key: NEW }, retired: [ { version: 2, key: OLDER }, { version: 1, key: OLDEST }, ], }, // — completeness check — // Default true. Samples the first 100 encrypted bodies and // refuses to start if any body references a key version // that's absent from active/retired. Catches the dropped- // retired-key footgun BEFORE the sweep writes a single PUT. verifyKeyringCompleteness: true, // sampleSize: 200, // optional override; default = min(100, total)
// — durable resume — // Save state every 500 rewrites. After a crash, the next // call to reEncryptObjectStorage(backend, opts) picks up // exactly past the last saved key — no re-checking of // already-rewritten objects. progress, saveProgressEveryN: 500,
onProgress: (e) => process.stderr.write( `${e.idx}/${e.total} ${e.action} ${e.key}\n`),});
// On successful end-to-end completion, progress.clear() is// called automatically — re-running the sweep starts fresh.console.log(`rewrote ${result.rewrote} of ${result.scanned}`);What the progress store sees:
- On first call:
load()returns{ lastKey: null, processedCount: 0 }, the sweep runs from the start. - Every
saveProgressEveryN-th rewrite:save({ lastKey, processedCount })is called. Pick the interval to trade IO overhead vs. crash- rewind window —500means a crash redoes at most ~500 objects, which on a 10M-object bucket is a 0.005% rewind. - After a clean end-of-sweep:
clear()is called. A freshreEncryptObjectStorageinvocation starts from the top.
When to disable verifyKeyringCompleteness:
The default is true and should stay that way for almost all
operators. Set it to false only when:
- You’ve independently verified the keyring is complete (e.g. a separate audit job has already enumerated every body version and confirmed every one is in the ring).
- You’re operating on a corpus where the sample-based check would itself be expensive (e.g. an extremely cold-storage backend where the first 100 reads cost real money), AND you’ve taken the independent-audit step above.
Disabling without independent verification trades a deterministic pre-flight failure for a mid-corpus abort — strictly worse from an operational-recovery standpoint.
Phase 4 — Drop the old key
Section titled “Phase 4 — Drop the old key”After the sweep, drop the retired[1] entry. Manifests pointing
at keyVersion = 1 would now fail to decrypt — but the sweep
guarantees none remain.
Recommended pacing: keep retired[] entries around for at
least one full backup cycle longer than the sweep takes. A
corrupted sweep run that drops retired[] immediately is
unrecoverable.
Reference — the symbols this guide uses
Section titled “Reference — the symbols this guide uses”| Symbol | What it does |
|---|---|
MigrationChain.start(name, v).next(...) | Define a multi-version event/state chain |
migratingAdapter(chain, { writeVersion }) | Adapter exposing the chain to the journal |
chain.manifestFor(value, version) | Lower-level envelope builder |
wrapEventAsEnvelope(event, manifestFor) | One-shot rewrite for pre-envelope data |
migrateInMemoryJournal(journal, fn) | Bulk-rewrite every event under a journal |
migrateSnapshotStore(store, pids, fn) | Same for snapshots |
MasterKeyRing { active, retired? } | Multi-version encryption key ring |
reEncryptObjectStorage(backend, opts) | Sweep: re-encrypt every body under a prefix to the active key |
ReEncryptProgressStore / InMemoryReEncryptProgressStore | Durable resume tokens for the sweep (#109) — plug a file/Redis/object-storage backed implementation for million-object buckets |
All of them are exported from the top-level actor-ts barrel.
Related
Section titled “Related”docs/persistence/migration-recipes.md— decision tree for which adapter to pick.CHANGELOG.md[0.6.0]→ “schema migration & encryption polish” for the underlying feature set.- Open issues: #71 bulk wrap-legacy migration for SQL/Cassandra.