Operations overview
The framework runs the same code in dev and prod, but production deployment has its own set of considerations that don’t matter on a laptop:
| Topic | Concerns |
|---|---|
| Deployment | How does a node start, register with the cluster, accept traffic, and shut down cleanly? |
| Tuning | Gossip cadence, mailbox sizes, failure-detector thresholds — defaults are sensible, but workloads vary. |
| Security | TLS on the cluster transport, secret management, key rotation. |
| Upgrades | Schema migrations, rolling deployments without downtime. |
| Troubleshooting | What logs, metrics, and traces tell you about a misbehaving system. |
This section maps each to a deeper page.
A production-ready setup checklist
Section titled “A production-ready setup checklist”For a non-trivial actor system going to production:
□ Persistence backend chosen + production-grade (e.g. SQLite for single-node, Cassandra for multi-node)□ Cluster wired up (if multi-node) with concrete seed strategy□ Downing strategy configured (split-brain protection)□ Coordinated shutdown configured with SIGTERM hooks□ Health checks exposed via HttpManagement□ Metrics exposed via Prometheus or OTel□ Log aggregation hooked up (structured logging via withFields/MDC)□ TLS enabled on the cluster transport□ Secrets via env vars, NOT in application.conf□ Kubernetes manifests with PreStop hook + grace period□ Tracing optional but configured if multi-actor flows exist□ Runbook for "actor X keeps crashing" / "cluster won't form"None of these are framework-specific tricks — they’re general production hygiene applied to an actor system. Each is covered in its own page.
Deployment
Section titled “Deployment”| How | When |
|---|---|
| Kubernetes | Cloud + container orchestration. StatefulSets, headless services, RBAC for the K8s API seed provider. |
| Docker Compose | Local multi-node clusters for testing and small deployments. |
| Process manager | systemd / PM2 — for bare-metal or VM deployments. |
K8s is the most common production path; the page covers Deployment vs StatefulSet, PreStop hooks, and the seed-provider configuration.
Tuning
Section titled “Tuning”Defaults work for most workloads — reach for these pages when you see specific symptoms:
| Knob | When to tune |
|---|---|
| Gossip cadence | Large clusters (>20 nodes) where the default 1-second gossip is wasteful, or small clusters where you want faster convergence. |
| Failure detector | Tight networks (LAN) where 2-second unreachable detection is too slow, or noisy networks where it triggers false alarms. |
| Mailbox sizing | When you see memory growth from unbounded mailboxes, or when bounded mailboxes are dropping more than expected. |
| Dispatcher tuning | When you observe HTTP latency degradation under actor-heavy load, or low CPU utilization with high actor throughput. |
If you can’t articulate the symptom, don’t tune. Defaults aren’t optimal but they’re rarely bad.
Security
Section titled “Security”Three layers:
| Concern | Page |
|---|---|
| Cluster transport TLS + auth | Cluster security |
| Encryption at rest for persisted data | Master key rotation |
| TLS everywhere (HTTP, brokers, journals) | TLS everywhere |
The cluster transport defaults to unauthenticated TCP — fine inside a private network, dangerous on the public internet. Always enable TLS + shared-secret auth for any cluster that crosses untrusted boundaries.
Upgrades
Section titled “Upgrades”The framework supports rolling deployments without downtime when configured for it. Two distinct concerns:
- Code rollouts — replacing node binaries while the cluster stays up. Handled by K8s rolling updates + coordinated shutdown. See Rolling migration.
- Schema migrations — evolving event/state shapes over time. Handled by event adapters + envelope versioning. See Upgrade strategies and Persistence migration overview.
The rolling-migration page is the most practical — a step-by-step recipe for “I need to add a field to my event without taking the cluster down.”
Troubleshooting
Section titled “Troubleshooting”Symptom → Likely cause─────────────────────────────────────────────────────────Actor isn't receiving messages → Stopped ref, dead-letter, mailbox fullCluster won't form → Seed nodes unreachable, port conflictSharded entities won't spawn → Coordinator not on a leader, role mismatchPersistentActor takes 30s to start → No snapshot, deep journal — set snapshotPolicyTests hang at the end → Forgot await system.terminate()Memory grows unboundedly → Unbounded mailboxes; pick a mailbox capSee Troubleshooting for the diagnostic-by-symptom catalog.