Health checks
The management server exposes two health endpoints:
GET /health— liveness. Returns 200 if the process is operational.GET /ready— readiness. Returns 200 if the pod is ready to receive traffic (cluster up + custom checks pass).
The framework registers some defaults (cluster-up for ready), plus you can plug in custom checks for app-specific health.
import { HttpManagement } from 'actor-ts';
const { health } = await HttpManagement.start(system, { port: 8558 });
health.addCheck('database', async () => { const ok = await db.ping(); return ok ? { ok: true } : { ok: false, reason: 'db unreachable' };});
health.addCheck('cache', async () => { try { await redis.ping(); return { ok: true }; } catch (e) { return { ok: false, reason: (e as Error).message }; }});When any check returns { ok: false }, the endpoint
returns 503 with a JSON body listing the failed checks.
The check signature
Section titled “The check signature”type HealthCheck = () => Promise<HealthCheckResult>;
interface HealthCheckResult { ok: boolean; reason?: string; // human-readable failure description details?: unknown; // structured info for diagnostics}Checks are async — return a Promise. Long-running checks block the response, so keep them fast (sub-second, ideally < 100 ms).
Liveness vs readiness
Section titled “Liveness vs readiness”| Probe | What it answers | What K8s does on failure |
|---|---|---|
Liveness (/health) | “Is this process fundamentally broken?” | Restart the pod. |
Readiness (/ready) | “Should this pod receive traffic right now?” | Stop routing to this pod (keep it running). |
Different semantics drive different checks:
- Liveness should only fail for unrecoverable issues — actor system crashed, deadlock detected, fundamental invariants broken. Restart is the only fix.
- Readiness can fail for transient issues — DB is briefly unreachable, cache is warming up, cluster is rejoining. No restart needed; just don’t route here yet.
Don’t put all checks into both — restarting a pod because the external DB blipped is wrong; the DB blip will pass. Put DB checks in readiness only.
Built-in readiness check
Section titled “Built-in readiness check”When the management server is configured with a cluster, the
default readiness check fails if the local node isn’t Up:
GET /ready{ ok: false, reason: 'cluster not joined yet' }Returns 200 once SelfUp fires. This is the canonical “wait
for the cluster” check.
Multiple checks
Section titled “Multiple checks”health.addCheck('database', dbCheck);health.addCheck('cache', cacheCheck);health.addCheck('downstream-api', apiCheck);All checks run in parallel when the endpoint is hit. The response includes per-check status:
{ "ok": false, "checks": { "cluster": { "ok": true }, "database": { "ok": false, "reason": "connection refused" }, "cache": { "ok": true }, "downstream-api": { "ok": true } }}The aggregate ok is true iff every check is true.
Liveness-only checks
Section titled “Liveness-only checks”health.addCheck('actor-system-alive', async () => { return { ok: !system.isTerminated, reason: system.isTerminated ? 'system terminated' : undefined, };}, { liveness: true, readiness: false });The optional second argument routes a check to liveness only
(default readiness: true) or readiness only.
The “system not terminated” check is automatically registered as liveness-only by the framework — it’s an unrecoverable state.
Tests for checks
Section titled “Tests for checks”import { TestKit } from 'actor-ts/testkit';
it('health check fails when DB is down', async () => { const tk = TestKit.create(); const { health } = await HttpManagement.start(tk.system, { port: 0 });
health.addCheck('db', async () => ({ ok: false, reason: 'mock' }));
const result = await health.run(); expect(result.ok).toBe(false); expect(result.checks!.db).toEqual({ ok: false, reason: 'mock' });
await tk.shutdown();});HealthCheckRegistry.run() exposes the same logic the
endpoint uses — useful for unit-testing your checks.
Timeouts
Section titled “Timeouts”health.addCheck('slow-thing', slowCheck, { timeoutMs: 2_000 });Per-check timeout. A check exceeding the timeout is treated as
{ ok: false, reason: 'timeout' }.
Without a timeout, a hung check blocks the whole /health
response — eventually triggering K8s’s own probe timeout (10 s
default) and a restart. Set check timeouts conservatively.
Where to next
Section titled “Where to next”- Management overview — the bigger picture.
- HTTP endpoints — the full endpoint reference.
- Kubernetes deployment — the probe configuration this pairs with.