Skip to content

Observability overview

A production actor system needs three things to be observable from the outside:

PillarWhat it answersModule
Metrics”What’s the rate / count / latency right now?”MetricsExtension
Tracing”What did this single request do?”TracingExtension
Management”Is the system alive and healthy?”HttpManagement

All three are extensions — they don’t run unless you reach for them. An app that ignores observability has no overhead from unused metrics buffers or unstarted trace exporters.

import { ActorSystem, MetricsExtensionId } from 'actor-ts';
const system = ActorSystem.create('my-app');
const metrics = system.extension(MetricsExtensionId);
const requests = metrics.counter('http.requests.total', { route: '/orders' });
requests.inc();
const latency = metrics.histogram('http.requests.duration_ms', { route: '/orders' });
latency.observe(42);
const active = metrics.gauge('sessions.active');
active.set(123);

Four metric types:

  • Counter — monotonically increasing. Total requests, total errors.
  • Gauge — point-in-time value. Active sessions, current memory usage.
  • Histogram — sampled distribution. Request latency, payload size. Lets you compute p50/p95/p99 at scrape time.
  • Timertimer.start() returns a stop function; built on top of histogram for timing-specific ergonomics.

Each metric has a name + labels (key-value pairs). Labels let you slice the same metric by dimension — http.requests.total by route or status.

The metrics themselves are framework-internal; getting them out to a metrics backend uses an exporter:

ExporterBackend
PrometheusExporterExposes a /metrics endpoint Prometheus scrapes.
PromClientAdapterPushes into the prom-client library if you’re already using it.
OtelMetricsAdapterReports via OpenTelemetry.

See Prometheus exporter for the deep dive on each.

The framework auto-records a baseline of metrics when the extension is started:

  • Actor metrics — message counts per actor type, processing duration histograms, mailbox depth gauges.
  • Mailbox metrics — enqueue rate, dequeue rate, dropped count for bounded mailboxes.
  • Cluster metrics — member count by state, gossip lag, reachability flips.

See Stock metrics for the full list. These give you “are my actors processing messages?” out of the box without writing any metric code.

import { ActorSystem, TracingExtensionId } from 'actor-ts';
import { OtelTracerAdapter } from 'actor-ts';
const system = ActorSystem.create('my-app');
system.extension(TracingExtensionId).configure({
tracer: new OtelTracerAdapter({ /* OTel SDK setup */ }),
});

With tracing enabled, every actor message gets its own span. The span carries:

  • The actor’s path.
  • The message’s class / kind.
  • Parent span context (from the sender’s active span).
  • Duration of the onReceive.

Spans chain across tells — an actor that processes a request and tells another actor passes the current span context via the envelope; the second actor’s span links back to the first.

HTTP request
└── actor /user/api receives request
└── actor /user/db processes query (linked back)
└── (Postgres span via OTel auto-instrumentation)

The end result: one trace per logical request, even when it hops through 4-5 actors.

The exporter is OpenTelemetry-style. Use OtelTracerAdapter in production; a RecordingTracer exists for tests.

import { HttpManagement, ActorSystem } from 'actor-ts';
const system = ActorSystem.create('my-app');
const management = await HttpManagement.start(system, {
port: 8558,
cluster, // optional, for /cluster endpoints
});

This spins up a small HTTP server (separate from your app’s HTTP server) that exposes endpoints for operations:

EndpointWhat
GET /health/readyLiveness — is the system up?
GET /health/aliveReadiness — is the system ready for traffic?
GET /cluster/membersList of cluster members (when cluster is configured).
GET /sharding/regionsSharding regions, hosted shards per node.
GET /metricsPrometheus exposition if PrometheusExporter is configured.

Useful for K8s probes (liveness + readiness) and ad-hoc operational debugging. See HTTP endpoints for the full surface.

management.addHealthCheck('db', async () => {
if (!(await db.ping())) return { ok: false, reason: 'db unreachable' };
return { ok: true };
});

Custom checks plug into /health/ready — a failing check makes the endpoint return 503, which K8s reads as “don’t route to this pod.”

See Health checks for the configuration.

For a new production deployment:

  1. Metrics — at least the stock ones, with a Prometheus exporter. Counter and gauge dashboards give you “what’s the system doing right now.”
  2. Health checks — liveness + readiness for K8s. Even if your workload doesn’t need fancy probes, K8s wants these endpoints.
  3. Tracing — last. Tracing is more involved (exporter configuration, sampling, cost) and gives diminishing returns for simple apps. Add it when you have multi-actor requests and need to see end-to-end latency.

For a dev / staging environment, none of these are required — console logs cover the basics.