Fundamentals
The data model, the instruments, and the rules that govern aggregation. Language-independent.
1.0The metrics model
A metric in OpenTelemetry is a stream of numeric measurements identified by a name, a unit, a kind of instrument, and a set of attributes. Unlike traces, where each span is a discrete record carrying its own identity, metric measurements are aggregated on the way out — the SDK collects many recorded values and emits a single condensed data point per export cycle. This is the most important conceptual difference from tracing: the SDK is doing math.
The shape on the wire (OTLP) is a four-level hierarchy:
protobuf (conceptual)ResourceMetrics { Resource resource // service.name, host, k8s.*, ... ScopeMetrics[] scope_metrics } ScopeMetrics { InstrumentationScope scope // e.g., "io.opentelemetry.fastapi" v0.46b0 Metric[] metrics } Metric { string name // "http.server.request.duration" string description string unit // "s", "By", "{request}" oneof data { // the kind Gauge gauge Sum sum // Counter, UpDownCounter Histogram histogram ExponentialHistogram exp_histogram Summary summary // legacy, Prometheus interop only } }
Each data kind contains a list of DataPoint entries, one per unique attribute set in the export interval. So a Counter named http.server.request.count with attributes {method, route, status} produces one data point per combination of those values in each export. The number of unique combinations is the metric's cardinality, and it is the single most important number to watch.
Two design properties to internalize:
- Pre-aggregation happens in the SDK. The Collector receives already-aggregated data points, not raw measurements. This makes metrics cheap to transmit but means high-cardinality attributes are paid for inside your application process.
- The attribute set is part of the metric identity. Adding an attribute doesn't enrich a metric — it splits the stream into one new time series per unique value. This is why "just adding user_id as a label" wrecks systems.
2.0Instruments
OpenTelemetry defines six instrument types. Picking the right one is mostly about answering two questions: can the value go down? and do I record each event, or do I observe the current state on demand?
| Instrument | Sync/Async | Use for |
|---|---|---|
| Counter | sync | Monotonic counts you record at the event site. Requests served, bytes received, retries attempted. |
| UpDownCounter | sync | Quantities that can go up or down, recorded as deltas. Items added to a queue (+1) and removed (-1), connections opened/closed. |
| Histogram | sync | Distributions where you care about percentiles and shape. Latencies, sizes, durations. |
| Gauge | sync | Non-additive snapshot values. Newer in the spec; previously only had ObservableGauge. CPU temperature, current room occupancy. |
| ObservableCounter | async | Cumulative totals you read from somewhere else. Total bytes from /proc/net/dev, total GC collections. |
| ObservableUpDownCounter | async | Current additive values you observe. Process heap size, active goroutines, current queue depth. |
| ObservableGauge | async | Current non-additive values you observe. CPU utilization percentage, current temperature. |
2.1The "Counter vs UpDownCounter" mistake
Both can be called with .add(), both increment a running total in the SDK. The difference: a Counter rejects negative values. The semantic difference: a Counter promises monotonicity. That promise propagates downstream — Prometheus' rate() function is built for monotonic counters and handles resets (decreasing values are interpreted as a counter reset, not as the value going down). Pass an UpDownCounter through rate() and you get garbage.
Concrete heuristic: if the question being answered is "how many things happened" → Counter. If the question is "how many things are currently in this state" → UpDownCounter (when you record changes) or ObservableUpDownCounter (when you observe the current total).
2.2Counter vs ObservableCounter
A Counter you call inline: counter.add(1) at the moment of the event. An ObservableCounter is a callback that the SDK invokes on each collection cycle, and the callback returns the cumulative total:
pythondef cb(opts): # Returns the absolute current value, not a delta. return [Observation(get_total_gc_collections(), {})] meter.create_observable_counter( "process.runtime.cpython.gc.collections", callbacks=[cb], unit="{collection}", )
Use the observable form when the underlying counter is owned by the runtime or OS — you can't intercept the event, you can only sample the total. Use the synchronous form when you control the event site.
3.0Sync vs async, in detail
The synchronous instruments record measurements inline with your code. They are cheap (just an attribute lookup and an addition to an aggregator) but they execute on whatever thread your code is running on. They can be associated with the active OTel context, which is how exemplars get linked to traces.
The asynchronous instruments are callback-based. You register a callback at instrument creation time; the SDK calls it once per collection cycle (typically every 60 seconds). The callback is expected to be fast and side-effect-free. Inside it you yield Observation objects — value plus attributes:
pythondef heap_callback(options: CallbackOptions) -> Iterable[Observation]:
yield Observation(get_heap_used_bytes(), {"generation": "young"})
yield Observation(get_heap_used_bytes_old(), {"generation": "old"})
meter.create_observable_up_down_counter(
"jvm.memory.used",
callbacks=[heap_callback],
unit="By",
)
Three operational properties of async instruments to remember:
- Callbacks run in the SDK's collection thread, not your application threads. They must be thread-safe with respect to whatever they read.
- Callback exceptions are caught and logged, not raised. A broken callback fails silently — you must monitor the SDK's internal metrics or check the Collector logs.
- The collection cycle is fixed. If you want to track something that changes at sub-second granularity, the async form will undersample. Use a synchronous instrument with explicit recording on every change.
4.0Aggregation and temporality
This is the conceptual chapter that pays off for everything else. The same measurement can be exported two ways:
- Cumulative temporality: each data point carries the total since the start time of the series. Series start time is constant for the life of the process. This is what Prometheus expects.
- Delta temporality: each data point carries the change since the last export. Series start time advances with each export.
Visualizing the same request counter:
| Time | Cumulative | Delta |
|---|---|---|
| T0 → T15s | 42 | 42 |
| T15s → T30s | 89 | 47 |
| T30s → T45s | 134 | 45 |
Both carry the same information; they differ in where the work happens. Cumulative pushes the rate computation to the consumer. Delta pushes it to the producer.
4.1Which to choose
The choice is forced by your backend:
- Prometheus / Mimir expect cumulative.
rate()andincrease()assume monotonically increasing values with counter resets detectable from a drop. - Datadog, Azure Monitor, New Relic prefer delta — their backends do the integration server-side.
For an LGTM stack on Mimir, you want cumulative. This is the SDK default and you should leave it alone for sync instruments. The OTLP exporter has a temporality_preference setting and the env var OTEL_EXPORTER_OTLP_METRICS_TEMPORALITY_PREFERENCE with three values: cumulative, delta, lowmemory.
4.2Why "lowmemory" exists
Cumulative temporality, for synchronous instruments, requires the SDK to retain state for every attribute set ever observed during the process's lifetime. If a counter is recorded with {tenant=A} and then never again, the SDK must still emit that data point on every export cycle indefinitely — that's what cumulative means. The memory grows with cumulative cardinality over the process lifetime.
Delta, by contrast, only needs the attribute sets seen in the current interval. Cheaper memory at the cost of more state at the backend.
lowmemory is the compromise: delta for synchronous instruments (where it helps), cumulative for asynchronous (where it doesn't matter, since async callbacks return cumulative values anyway). If your service has very high cumulative cardinality and you're seeing memory pressure, this is the lever.
For Mimir, you'd then need a deltatocumulative processor in the Collector to convert back. Or, given Mimir's good Prometheus compatibility, just stay on cumulative end-to-end and manage cardinality through Views (next section).
4.3Aggregation types
Each instrument has a default aggregation mapping into one of these:
| Aggregation | Default for | Output |
|---|---|---|
| Sum | Counter, UpDownCounter, ObservableCounter, ObservableUpDownCounter | Single number per data point. |
| LastValue | Gauge, ObservableGauge | Single number — most recent measurement. |
| ExplicitBucketHistogram | Histogram (default) | Count, sum, min, max, plus bucket counts. |
| Base2ExponentialBucketHistogram | Histogram (when opted in) | Count, sum, min, max, plus exponentially-scaled buckets. |
| Drop | (never default) | Discards measurements. Used via Views to disable a metric entirely. |
You override defaults using Views, covered shortly.
5.0Histograms in depth
Histograms are the most expensive and most consequential metric type. Get them right and you have percentiles and shape data with reasonable cost. Get them wrong and you have meaningless percentiles or a cardinality bomb.
5.1Explicit-bucket histograms
Each bucket counts how many measurements fell within its boundary. The OTel default bucket boundaries (influenced by Prometheus' client defaults) are tuned for HTTP latencies in seconds:
defaults(-∞, 0], (0, 5], (5, 10], (10, 25], (25, 50], (50, 75],
(75, 100], (100, 250], (250, 500], (500, 750], (750, 1000],
(1000, 2500], (2500, 5000], (5000, 7500], (7500, 10000], (10000, +∞)
Sixteen buckets total. Note these are in milliseconds for the old default; the new HTTP semconv (http.server.request.duration) is in seconds and uses:
http semconv0, 0.005, 0.01, 0.025, 0.05, 0.075, 0.1, 0.25, 0.5, 0.75, 1,
2.5, 5, 7.5, 10
Per-instrument bucket configuration via advisory parameters at instrument creation time, or via Views at the SDK level (the View wins if both are set).
5.2How to pick buckets
A bucket boundary becomes a knee in your percentile graph. If your real p95 is at 87ms and the nearest bucket boundaries are 75ms and 100ms, every percentile query between p50 and p99 will be linearly interpolated within that bucket — and tell you "around 87ms" even if the true value moves between 75 and 100. The reported percentile will only ever be at a bucket boundary or interpolated between them.
Rules of thumb:
- Have at least one bucket boundary at each SLO threshold. If you alert at p95 < 200ms, put a bucket at 200ms.
- Span 3-4 orders of magnitude. If you have a service that's normally 5ms but occasionally 5s, your buckets must cover both ends.
- Roughly exponential spacing. Tight buckets where you spend most of your time, looser ones in the tails.
- Avoid > 20 buckets per histogram. Each bucket is one Prometheus time series, multiplied by every attribute combination.
5.3Exponential (Native) histograms
Exponential histograms automatically choose their bucket boundaries via a base-2 exponential scheme parameterized by a scale. The boundaries form a geometric progression with factor 2^(1/2^scale). At scale=3, the factor is ~1.09; at scale=0, the factor is 2.
Why this matters: the same shape works for nanoseconds and minutes without configuration. Resolution adjusts dynamically — the SDK starts at a high scale and reduces it if too many buckets are needed (capped, by default, at 160 buckets).
Mimir supports exponential histograms (via conversion to Prometheus native histograms) since the introduction of native histogram support. To use them, you need three things:
- Enable native histogram ingestion on Mimir distributors and ingesters.
- Configure your application's SDK or a View to use
Base2ExponentialBucketHistogramAggregation. - Update dashboards:
histogram_quantile(0.95, rate(http_server_request_duration_seconds_bucket[5m]))becomeshistogram_quantile(0.95, sum(rate(http_server_request_duration_seconds[5m])))— note the dropped_bucketandlegrouping. Mimir's native histogram path is different.
You can run dual: keep explicit buckets in production, send exponential to a new metric name in parallel, validate dashboards, then cut over. The OTEL_EXPORTER_OTLP_METRICS_DEFAULT_HISTOGRAM_AGGREGATION env var lets you flip the SDK default without code changes.
5.4What ends up in Prometheus
For an explicit-bucket histogram named http.server.request.duration with attributes {method=GET, route=/orders}, the OTLP-to-Prometheus translation produces:
prometheushttp_server_request_duration_seconds_bucket{method="GET", route="/orders", le="0.005"} 12 http_server_request_duration_seconds_bucket{method="GET", route="/orders", le="0.01"} 18 ... (one per bucket boundary) http_server_request_duration_seconds_bucket{method="GET", route="/orders", le="+Inf"} 4287 http_server_request_duration_seconds_count{method="GET", route="/orders"} 4287 http_server_request_duration_seconds_sum{method="GET", route="/orders"} 312.4
Note the unit suffix (_seconds) and the dots converted to underscores. The number of series per histogram = (buckets + 2) × (attribute combinations). This is why histogram cardinality compounds.
6.0The SDK pipeline
Like tracing, every OTel metrics SDK has the same internal shape:
6.1MeterProvider
Singleton per process. Owns the resource, the list of metric readers, and the list of views. All meters come from it. Configure once, very early in your bootstrap.
6.2Meter
Equivalent to Tracer in tracing. You ask the MeterProvider for one by name (typically the module name plus version). Each instrument you create belongs to a Meter; the Meter's name+version is recorded as InstrumentationScope on every metric.
6.3MetricReader
Two variants and they are not interchangeable:
- PeriodicExportingMetricReader wraps an OTLP exporter. It runs collection on a timer (default 60s, configurable via
OTEL_METRIC_EXPORT_INTERVAL), then hands the collected batch to the exporter. This is what you use to push to a Collector. - PrometheusReader exposes an HTTP endpoint that, on each scrape, triggers a collection. It does not export; it makes data available. Used when you want Prometheus (or Alloy with a scrape job) to pull from your app.
You can register multiple readers. A common pattern: one PeriodicExportingMetricReader to an OTel Collector (push) plus a PrometheusReader on /metrics (pull) for backward compatibility with existing Prometheus scraping.
6.4Export interval and resolution
The export interval controls how often the SDK collects and exports. The default 60s is appropriate for Mimir at scale. Shorter intervals (15s, 10s) give better resolution at the cost of more network calls and more granular data points in Mimir. For most observability use cases, the difference between 30s and 60s isn't worth the cost; for SLO computation on tight error budgets, 15s can be worth it.
Important: collection is global across all instruments on the MeterProvider. You cannot have one metric export every 10s and another every 60s within the same Reader.
7.0Views — the most underused feature
A View is a transformation rule applied to a metric before it leaves the SDK. Views match instruments by criteria (name, type, meter name) and modify what gets emitted — they can drop attributes, change aggregation, rename, set bucket boundaries, or drop the metric entirely.
Why this matters: instrumentation libraries you don't own decide their default attributes and bucket boundaries. They are conservative on bucket sizing and liberal on attributes. Views are how you take control without forking the library.
7.1Common Views
Drop a high-cardinality attribute. The default HTTP server instrumentation includes http.target (the raw URL including query string). It's invaluable on traces, lethal on metrics. Drop it from the metric:
pythonfrom opentelemetry.sdk.metrics.view import View
view = View(
instrument_name="http.server.request.duration",
attribute_keys={"http.request.method", "http.response.status_code", "http.route"},
)
This says: for this metric, the only attributes that may appear on data points are these three. Everything else is dropped. Cardinality becomes bounded by methods × statuses × routes — a few hundred series at most.
Override histogram buckets:
pythonfrom opentelemetry.sdk.metrics.view import ExplicitBucketHistogramAggregation
view = View(
instrument_name="http.server.request.duration",
aggregation=ExplicitBucketHistogramAggregation(
boundaries=[0.001, 0.005, 0.01, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5]
),
)
Switch to exponential:
pythonfrom opentelemetry.sdk.metrics.view import Base2ExponentialBucketHistogramAggregation
view = View(
instrument_name="http.server.request.duration",
aggregation=Base2ExponentialBucketHistogramAggregation(max_size=160, max_scale=20),
)
Disable a metric entirely:
pythonfrom opentelemetry.sdk.metrics.view import DropAggregation view = View( instrument_name="http.server.active_requests", # we don't use it aggregation=DropAggregation(), )
7.2View matching and ordering
Multiple Views can match the same instrument. Each matching View produces its own metric stream. If two Views produce conflicting metric identities (same name, same attributes, different aggregations), the SDK emits a warning and one wins arbitrarily. Always test that your Views produce what you expect.
8.0Cardinality — the recurring nightmare
Cardinality is the number of unique attribute-set combinations for a metric. Every combination produces one time series in Mimir. Three attributes with 30 values each gives 27,000 series per metric. Mimir bills by active series. Queries scan time series. Memory in the SDK retains state per series.
The standard offenders:
- User IDs, account IDs, tenant IDs — unbounded.
- Request IDs, trace IDs — guaranteed unbounded.
- IP addresses — high cardinality, slowly bounded.
- Raw URLs (
/orders/A8742instead of/orders/{order_id}) — unbounded. - Free-text fields (error messages with timestamps in them, user-agent strings) — unbounded.
8.1SDK cardinality limit
The OTel SDK has a per-instrument cardinality limit (default 2000). If you exceed it, new attribute sets go into an overflow bucket attributed to a single synthetic attribute set. This prevents memory blowup but loses fidelity. The limit is configurable per-instrument via Views or globally via env var.
Hitting the overflow is a signal that something is mis-instrumented. If you see otel.metric.points.dropped climbing or a metric with an otel.metric.overflow=true data point, find the offender.
8.2Bucketing high-cardinality attributes
Sometimes you genuinely want the dimension, but not at its raw cardinality. The pattern is to bucket at the call site:
pythondef status_class(code: int) -> str: if code < 200: return "1xx" if code < 300: return "2xx" if code < 400: return "3xx" if code < 500: return "4xx" return "5xx" http_requests.add(1, { "http.response.status_code": code, # keep for traces "http.response.status_code_class": status_class(code), # for metrics })
Then a View on the metric drops http.response.status_code and keeps the class:
pythonView(
instrument_name="http.server.request.count",
attribute_keys={"http.request.method", "http.route", "http.response.status_code_class"},
)
Six unique status classes << thousands of status codes.
8.3Where to enforce
You have three places to control cardinality:
- At record time: don't pass the attribute. Cheapest. Use when you never want it.
- In the SDK via Views: drop or filter attributes. Use when libraries you don't own emit them and you want to be selective.
- In the Collector:
metricstransformortransformprocessors. Use as a safety net or for org-wide policy. Don't rely on it alone — the SDK has already paid for the cardinality in memory.
9.0Exemplars — the trace bridge
An exemplar is a single trace ID attached to a single sample within a metric data point, asserting "this measurement came from this trace". They are the most direct path from a chart to a specific trace.
Mechanically: when a synchronous instrument records a measurement while a trace span is active and sampled, the SDK can attach the active trace ID to that measurement. For histogram aggregation, the exemplar is attached to the bucket the value fell into. For sum aggregation, the exemplar is attached to the data point as a whole.
9.1The reservoir
An instrument might record millions of measurements per export cycle but a histogram only has so many buckets. You can't carry an exemplar for every measurement. The SDK uses a reservoir — a small sample of measurements kept along with their trace IDs.
Two reservoir types in the spec:
- SimpleFixedSize for sum-typed instruments. A small fixed sample regardless of bucket structure.
- AlignedHistogramBucket for histograms. One exemplar per bucket, keeping the most-recently-recorded.
This is why exemplars don't always show up where you might expect: the slow trace you're looking for has to be the last one to populate its bucket, and the bucket has to have been hit in the current export cycle. Statistical exemplars are not a replacement for tail sampling on the trace side.
9.2Enabling exemplars
Three preconditions for an exemplar to appear in your Mimir-backed dashboard:
- SDK: exemplar filter set to allow recording. Default in modern SDKs is to record exemplars when there's a sampled trace; check the
OTEL_METRICS_EXEMPLAR_FILTERenv var if you don't see them. Values:trace_based(default),always_on,always_off. - Prometheus exposition format: must be OpenMetrics, not classic Prometheus. The OTLP path handles this automatically. If you're scraping a
/metricsendpoint, configure the scraper to sendAccept: application/openmetrics-text. - Mimir:
--ingester.exemplars-max-samples-per-userset to a non-zero value. Without this, Mimir drops exemplars on ingest.
Once wired, in Grafana you enable "Exemplars" on the panel options, and small dots appear on histogram panels. Click → trace view in Tempo.
10.0Semantic conventions
Naming matters intensely for metrics. Two services emitting the same conceptual measurement under different names produce two separate time series — dashboards break, alerts duplicate, queries lengthen.
OTel's semantic conventions for metrics are organized by domain. The most important ones for typical services:
| Domain | Key metrics |
|---|---|
| HTTP server | http.server.request.duration, http.server.request.body.size, http.server.response.body.size, http.server.active_requests |
| HTTP client | http.client.request.duration, http.client.active_requests, http.client.open_connections |
| Database client | db.client.operation.duration, db.client.connection.count, db.client.connection.idle.max |
| Messaging | messaging.publish.duration, messaging.receive.duration, messaging.process.duration |
| RPC | rpc.server.duration, rpc.client.duration |
| OS process | process.cpu.utilization, process.memory.usage, process.threads, process.open_file_descriptors |
| Runtime | process.runtime.cpython.gc.collections, process.runtime.cpython.memory, v8js.heap.space.used |
10.1Units
Use UCUM (Unified Code for Units of Measure) unit strings: s for seconds, ms for milliseconds, By for bytes, 1 for a dimensionless ratio, {request} for counts of "requests" (curly braces denote annotation, not a real unit). The OTel-to-Prometheus translator appends the unit to metric names — duration in seconds becomes duration_seconds in Prometheus.
10.2The migration tension
The HTTP semconv recently went stable, renaming many attributes (http.method → http.request.method, etc.) and changing units (http.server.duration in ms → http.server.request.duration in seconds). The env var OTEL_SEMCONV_STABILITY_OPT_IN with values http or http/dup controls whether instrumentation emits new names only or both. For new deployments, opt in to the stable names. For existing dashboards, run with http/dup during migration and update queries before turning the duplicate off.
Python & FastAPI
Packages, Views, runtime metrics, and the multi-process trap.
P.1Packages
| Package | Purpose |
|---|---|
opentelemetry-api | The metrics API surface. |
opentelemetry-sdk | SDK implementation including MeterProvider and aggregators. |
opentelemetry-exporter-otlp-proto-grpc | OTLP/gRPC exporter for metrics. |
opentelemetry-exporter-otlp-proto-http | OTLP/HTTP exporter. |
opentelemetry-exporter-prometheus | Local /metrics endpoint exposing OTel metrics in Prometheus format. |
opentelemetry-instrumentation-fastapi | HTTP server instrumentation. Emits HTTP semconv metrics automatically. |
opentelemetry-instrumentation-httpx | HTTP client metrics. |
opentelemetry-instrumentation-system-metrics | OS process and runtime metrics (CPU, memory, GC). |
opentelemetry-instrumentation-sqlalchemy | DB connection pool and query duration metrics. |
P.2Bootstrap (manual)
The complete shape:
python# telemetry.py from opentelemetry import metrics from opentelemetry.sdk.resources import Resource from opentelemetry.sdk.metrics import MeterProvider from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader from opentelemetry.sdk.metrics.view import ( View, ExplicitBucketHistogramAggregation, DropAggregation, ) from opentelemetry.exporter.otlp.proto.grpc.metric_exporter import OTLPMetricExporter def setup_metrics(service_name: str, service_version: str, environment: str) -> None: resource = Resource.create({ "service.name": service_name, "service.version": service_version, "deployment.environment": environment, }) exporter = OTLPMetricExporter( endpoint="http://alloy.observability:4317", insecure=True, ) reader = PeriodicExportingMetricReader( exporter, export_interval_millis=30_000, # 30s export_timeout_millis=10_000, ) views = [ # Custom buckets for HTTP latency — covers 1ms to 30s View( instrument_name="http.server.request.duration", aggregation=ExplicitBucketHistogramAggregation( boundaries=[0.001, 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10, 30] ), attribute_keys={"http.request.method", "http.route", "http.response.status_code"}, ), # Drop noisy active-requests metric View( instrument_name="http.server.active_requests", aggregation=DropAggregation(), ), ] provider = MeterProvider( resource=resource, metric_readers=[reader], views=views, ) metrics.set_meter_provider(provider)
P.3Auto-instrumentation
The same opentelemetry-instrument CLI that handles tracing also handles metrics. With FastAPI + SQLAlchemy + httpx installed, you get HTTP semconv metrics on every request, DB connection pool metrics, and HTTP client metrics automatically:
shellOTEL_SERVICE_NAME=payments-api \
OTEL_EXPORTER_OTLP_ENDPOINT=http://alloy.observability:4317 \
OTEL_METRICS_EXPORTER=otlp \
OTEL_METRIC_EXPORT_INTERVAL=30000 \
OTEL_SEMCONV_STABILITY_OPT_IN=http \
opentelemetry-instrument uvicorn main:app
Environment variables worth knowing:
| Variable | Effect |
|---|---|
OTEL_METRICS_EXPORTER | otlp · prometheus · none · console |
OTEL_METRIC_EXPORT_INTERVAL | Milliseconds. Default 60000. |
OTEL_EXPORTER_OTLP_METRICS_TEMPORALITY_PREFERENCE | cumulative · delta · lowmemory |
OTEL_EXPORTER_OTLP_METRICS_DEFAULT_HISTOGRAM_AGGREGATION | explicit_bucket_histogram · base2_exponential_bucket_histogram |
OTEL_METRICS_EXEMPLAR_FILTER | trace_based (default) · always_on · always_off |
OTEL_PYTHON_DISABLED_INSTRUMENTATIONS | Comma list to skip noisy instrumentations. |
P.4Custom instruments
The pattern for business metrics — what makes traces useful for incident response but doesn't appear in HTTP-level data:
python# metrics.py — module-level instruments from opentelemetry import metrics meter = metrics.get_meter("payments.domain", "2.14.3") # Counter for discrete business events payments_settled = meter.create_counter( "payments.settled.count", unit="{payment}", description="Successfully settled payments", ) # Histogram for amount distributions payment_amount = meter.create_histogram( "payments.amount", unit="EUR", description="Payment amount distribution in minor units", ) # UpDownCounter for in-flight work payments_inflight = meter.create_up_down_counter( "payments.inflight", unit="{payment}", description="Payments currently being processed", ) # ObservableGauge for periodic external state def queue_depth_cb(opts): yield Observation(redis_client.llen("payments:pending"), {"queue": "pending"}) yield Observation(redis_client.llen("payments:retry"), {"queue": "retry"}) meter.create_observable_gauge( "payments.queue.depth", callbacks=[queue_depth_cb], unit="{item}", )
Use them where the events happen:
pythonasync def settle_payment(order: Order):
payments_inflight.add(1, {"payment.gateway": order.gateway})
try:
result = await gateway.charge(order)
payments_settled.add(1, {
"payment.gateway": order.gateway,
"payment.currency": order.currency,
"payment.outcome": "success" if result.ok else "declined",
})
payment_amount.record(order.amount_minor, {
"payment.gateway": order.gateway,
"payment.currency": order.currency,
})
finally:
payments_inflight.add(-1, {"payment.gateway": order.gateway})
Three idioms in that snippet worth highlighting:
- Instruments at module scope, not in classes. They're cheap to create but you shouldn't recreate them per request. One per metric, lifetime of the process.
- Attribute set must match the Counter and the Histogram. If they diverge, the dashboards diverge.
- Try/finally for UpDownCounter. A miss on the decrement leaks the gauge forever. Treat it like a lock.
P.5FastAPI-specific patterns
The FastAPI instrumentation emits HTTP semconv metrics out of the box. To extend per-request with business attributes, use a middleware:
pythonfrom fastapi import FastAPI, Request from opentelemetry import trace, metrics meter = metrics.get_meter("payments.http") tenant_requests = meter.create_counter("payments.http.tenant.requests") @app.middleware("http") async def tenant_metrics(request: Request, call_next): tenant = request.headers.get("x-tenant-id", "unknown") response = await call_next(request) tenant_requests.add(1, { "tenant.id": tenant, # careful: cardinality "http.route": request.scope.get("route").path if request.scope.get("route") else "unknown", "http.response.status_code": response.status_code, }) return response
For tenant.id in particular, you almost certainly want a View to constrain that metric's attribute set. Or better: keep tenant.id on traces (where unbounded cardinality is fine) and use a tenant bucket on metrics.
P.5.1Excluding health checks
Set OTEL_PYTHON_FASTAPI_EXCLUDED_URLS or, for explicit instrumentation:
pythonFastAPIInstrumentor.instrument_app(app, excluded_urls="/health,/metrics,/ready")
Otherwise every Kubernetes liveness probe contributes to your request rate and percentiles. With probes at 1Hz across many pods, this drowns out real traffic in low-throughput services.
P.6System and runtime metrics
opentelemetry-instrumentation-system-metrics uses psutil under the hood and emits OS process metrics plus Python-specific runtime metrics. Configure it with a dict of metric → list of dimensions:
pythonfrom opentelemetry.instrumentation.system_metrics import SystemMetricsInstrumentor
config = {
"process.runtime.memory": ["rss", "vms"],
"process.runtime.cpu.time": ["user", "system"],
"process.runtime.gc_count": None,
"process.runtime.thread_count": None,
"process.runtime.cpu.utilization": None,
"process.runtime.context_switches": ["voluntary", "involuntary"],
"system.network.dropped.packets": ["transmit", "receive"],
}
SystemMetricsInstrumentor(config=config).instrument()
The process.runtime.* prefix is technically deprecated in favor of the new process.* semconv, but the Python instrumentation still emits both at time of writing. For Mimir dashboards, the practical question is which name you query. Use the new process.* names; the deprecated path will be removed in a future version.
P.6.1What you actually want from runtime metrics
The defaults are noisy. A reasonable subset for a Python service:
process.cpu.utilization— saturation signal.process.memory.usage(RSS) — track for leak detection.process.runtime.cpython.gc.collectionsby generation — GC pressure correlates with latency P99 spikes.process.thread.count— diagnostic for thread leaks.process.open_file_descriptors— diagnostic for FD leaks under load.
P.7Views in Python
Views are passed to the MeterProvider constructor, in order. Each matching View produces a stream. A View matches an instrument when all specified criteria match.
pythonfrom opentelemetry.sdk.metrics.view import View # Match by name (most common) View(instrument_name="http.server.request.duration", ...) # Match by name with wildcard View(instrument_name="http.server.*", ...) # Match by meter View(meter_name="opentelemetry.instrumentation.fastapi", ...) # Match by instrument type View(instrument_type=Histogram, ...)
The applied transformation is configured on the same View:
pythonView( instrument_name="http.server.request.duration", # Filter to a fixed attribute set attribute_keys={"http.request.method", "http.route", "http.response.status_code"}, # Override aggregation aggregation=ExplicitBucketHistogramAggregation(boundaries=[...]), # Rename the metric in output name="http_server_duration_v2", description="HTTP server request duration in seconds", )
Order matters when multiple Views could match. The first match wins (in OTel Python's implementation). Be careful about wildcard Views interacting with specific Views — put specific ones first.
P.8Prometheus exposition
If you want to expose a /metrics endpoint locally (for an Alloy scrape job or for compatibility with existing Prometheus pull infrastructure), register a PrometheusReader alongside the OTLP one:
pythonfrom opentelemetry.exporter.prometheus import PrometheusMetricReader from prometheus_client import start_http_server prometheus_reader = PrometheusMetricReader() otlp_reader = PeriodicExportingMetricReader(OTLPMetricExporter(endpoint="...")) provider = MeterProvider( resource=resource, metric_readers=[prometheus_reader, otlp_reader], ) metrics.set_meter_provider(provider) # Expose /metrics on port 9464 start_http_server(port=9464, addr="0.0.0.0")
Now you have both. The dual setup is useful during migration: scrape via Prometheus initially, then transition dashboards to the OTLP-pushed copy in Mimir and decommission the scrape.
P.9Gunicorn and multi-process
Gunicorn's pre-fork worker model creates one Python process per worker. Each worker has its own MeterProvider, its own aggregators, its own export. From Mimir's perspective, you have N independent services emitting metrics for the same logical service — and Mimir helpfully aggregates them via the instance label.
Two things to ensure:
- Initialize OTel inside the worker, not in the master. Use Gunicorn's
post_forkhook:
python# gunicorn.conf.py def post_fork(server, worker): from myapp.telemetry import setup_metrics setup_metrics(service_name="payments-api", ...)
- Set
service.instance.iduniquely per worker. The default resource detector typically uses the host name, which is the pod name in Kubernetes — but all workers share that. Append the PID:
pythonimport os
resource = Resource.create({
"service.name": "payments-api",
"service.instance.id": f"{os.environ.get('HOSTNAME', 'unknown')}-{os.getpid()}",
})
Without unique instance IDs, Mimir sees multiple producers reporting on the same series and gets confused about counter resets.
Same story for uvicorn --workers N. The workers fork after import, so initializing OTel at import time in the app module would happen pre-fork. Use FastAPI's lifespan to defer initialization until the worker is running.
P.10Production patterns (Python)
- Cumulative temporality end-to-end for Mimir. Don't get clever with delta unless you have a specific memory reason.
- 30-second export interval as a default. Tune down to 15s for SLO-critical services, up to 60s for low-traffic background workers.
- Explicit bucket boundaries for every histogram. The defaults are wrong for most workloads. Audit them.
- Views to constrain HTTP server attributes to
{method, route, status_code}plus your tenant bucket. Nothing else. - SystemMetricsInstrumentor with a curated metric list. Don't enable everything; pick the half-dozen that move SRE alerts.
- Post-fork bootstrap with unique instance IDs. Mandatory for any forking server.
- Exemplars on. The dashboards become drastically more useful and the overhead is negligible.
- Health checks excluded. Both at instrumentation and as a Collector-side safety net.
- Custom business metrics at module scope. One Counter per business event, with a clean attribute schema.
- Manual instrument lifecycle review — at least once. Run your service for a day, query Mimir for
topk(20, count by (__name__)(group({__name__=~".+"}))), and see which metrics generated the most series. The top of that list is where to focus Views.
TypeScript & Node.js
Synchronous Gauge support, NodeSDK metric wiring, and where Node runtime metrics differ from Python's.
T.1Packages
| Package | Purpose |
|---|---|
@opentelemetry/api | API only. Type definitions for instruments. |
@opentelemetry/sdk-metrics | Core Metrics SDK: MeterProvider, Views, aggregators, readers. |
@opentelemetry/sdk-node | All-in-one bootstrap; wires metrics, traces, logs. |
@opentelemetry/exporter-metrics-otlp-grpc | OTLP/gRPC metrics exporter. |
@opentelemetry/exporter-metrics-otlp-http | OTLP/HTTP metrics exporter. |
@opentelemetry/exporter-prometheus | Prometheus scrape endpoint. |
@opentelemetry/auto-instrumentations-node | The "everything" pack for instrumentations. |
@opentelemetry/host-metrics | Process and OS metrics (Node equivalent of system-metrics). |
@opentelemetry/semantic-conventions | Typed constants for semconv names. |
T.2NodeSDK bootstrap (with metrics)
typescript// instrumentation.ts — must load before any other module import { NodeSDK } from '@opentelemetry/sdk-node'; import { OTLPMetricExporter } from '@opentelemetry/exporter-metrics-otlp-grpc'; import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-grpc'; import { PeriodicExportingMetricReader, View, ExplicitBucketHistogramAggregation } from '@opentelemetry/sdk-metrics'; import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node'; import { resourceFromAttributes } from '@opentelemetry/resources'; import { ATTR_SERVICE_NAME, ATTR_SERVICE_VERSION, } from '@opentelemetry/semantic-conventions'; import { HostMetrics } from '@opentelemetry/host-metrics'; import { metrics } from '@opentelemetry/api'; const metricReader = new PeriodicExportingMetricReader({ exporter: new OTLPMetricExporter({ url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT ?? 'http://alloy.observability:4317', }), exportIntervalMillis: 30_000, }); const sdk = new NodeSDK({ resource: resourceFromAttributes({ [ATTR_SERVICE_NAME]: process.env.OTEL_SERVICE_NAME ?? 'unknown', [ATTR_SERVICE_VERSION]: process.env.SERVICE_VERSION ?? '0.0.0', }), traceExporter: new OTLPTraceExporter(), metricReader, views: [ new View({ instrumentName: 'http.server.request.duration', aggregation: new ExplicitBucketHistogramAggregation( [0.001, 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10], ), }), ], instrumentations: [ getNodeAutoInstrumentations({ '@opentelemetry/instrumentation-fs': { enabled: false }, }), ], }); sdk.start(); // Host metrics (CPU, memory, event loop) — separate API const hostMetrics = new HostMetrics({ meterProvider: metrics.getMeterProvider(), name: 'host-metrics', }); hostMetrics.start(); process.on('SIGTERM', () => { sdk.shutdown().finally(() => process.exit(0)); });
Two things specific to Node:
HostMetricsis a separate package and must be wired explicitly. Unlike Python'sSystemMetricsInstrumentor, it doesn't auto-attach when imported.- Views are passed at SDK construction, not added later. If you need to add or change Views at runtime, you currently have to restart.
T.3Auto-instrumentation for metrics
The auto-instrumentations package emits HTTP server, HTTP client, and a handful of other library metrics by default once a metricReader is configured on the SDK. There is no separate flag — the presence of the reader activates metric emission across all instrumentations.
Per-instrumentation enables and disables for metrics specifically are typically not exposed; the lever is at the instrumentation level (disabling Express disables both its spans and its metrics). For finer control, configure Views with DropAggregation to silence specific metrics without disabling the instrumentation.
T.4Custom instruments
typescriptimport { metrics } from '@opentelemetry/api'; const meter = metrics.getMeter('payments.domain', '2.14.3'); const paymentsSettled = meter.createCounter('payments.settled.count', { unit: '{payment}', description: 'Successfully settled payments', }); const paymentAmount = meter.createHistogram('payments.amount', { unit: 'EUR', description: 'Payment amount distribution in minor units', advice: { explicitBucketBoundaries: [100, 500, 1000, 5000, 10000, 50000, 100000] }, }); const paymentsInflight = meter.createUpDownCounter('payments.inflight', { unit: '{payment}', }); // Observable, callback-based meter.createObservableGauge('payments.queue.depth', { unit: '{item}', }).addCallback((result) => { result.observe(getQueueDepth('pending'), { queue: 'pending' }); result.observe(getQueueDepth('retry'), { queue: 'retry' }); });
Note the advice field on the histogram — this is the OTel instrument advisory parameter for bucket boundaries, set at instrument creation time. The View takes precedence if both are set; the advisory acts as a default for libraries that don't know the right buckets a priori.
T.4.1Usage
typescriptasync function settlePayment(order: Order): Promise<PaymentResult> {
paymentsInflight.add(1, { 'payment.gateway': order.gateway });
try {
const result = await gateway.charge(order);
paymentsSettled.add(1, {
'payment.gateway': order.gateway,
'payment.currency': order.currency,
'payment.outcome': result.ok ? 'success' : 'declined',
});
paymentAmount.record(order.amountMinor, {
'payment.gateway': order.gateway,
'payment.currency': order.currency,
});
return result;
} finally {
paymentsInflight.add(-1, { 'payment.gateway': order.gateway });
}
}
T.5Express, Fastify, NestJS
HTTP server metrics come from @opentelemetry/instrumentation-http at the Node http module level — Express and Fastify both run on top of it, so the metrics work without per-framework configuration. The framework-specific instrumentations layer span attribution but the metric emission point is the HTTP layer.
T.5.1NestJS
For NestJS, the nestjs-otel community package provides decorators that auto-instrument controllers and services with both counters and histograms — useful when you want per-method metrics without writing them by hand:
typescriptimport { OtelMethodCounter, OtelInstanceCounter } from 'nestjs-otel';
@Injectable()
@OtelInstanceCounter()
export class PaymentsService {
@OtelMethodCounter()
async settle(order: Order): Promise<PaymentResult> {
...
}
}
Generates metrics like app_PaymentsService_instances_total and app_PaymentsService_settle_calls_total. Useful for quick service-method visibility but watch the metric proliferation if you use it broadly.
T.6Runtime metrics
The @opentelemetry/host-metrics package provides what Node has that Python's runtime instrumentation provides:
| Metric | Notes |
|---|---|
process.cpu.utilization | Per-state CPU utilization. |
process.memory.usage | RSS, heap used, heap total. |
nodejs.eventloop.lag | Event loop lag in milliseconds — leading indicator of saturation. |
nodejs.eventloop.utilization | How busy the event loop is. |
v8js.heap.space.used | V8 heap usage per space. |
v8js.gc.duration | GC duration histogram. |
The two metrics specifically worth watching in Node services: event loop lag (anything above 50ms means the event loop is starved, which means tail latencies are spiking) and V8 heap usage (track for slow leaks). Most Node performance problems show up first in one of these two before they're visible at the HTTP latency level.
T.7Views in JS
typescriptimport { View, ExplicitBucketHistogramAggregation, AggregationType }
from '@opentelemetry/sdk-metrics';
new View({
instrumentName: 'http.server.request.duration',
aggregation: new ExplicitBucketHistogramAggregation(
[0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5],
),
attributeKeys: ['http.request.method', 'http.route', 'http.response.status_code'],
});
For an exponential histogram via View:
typescriptimport { Base2ExponentialHistogramAggregation } from '@opentelemetry/sdk-metrics';
new View({
instrumentName: 'http.server.request.duration',
aggregation: new Base2ExponentialHistogramAggregation(160, 20),
});
To rename a metric:
typescriptnew View({
instrumentName: 'my.counter',
name: 'my.renamed.counter',
});
The JS SDK takes a small object literal rather than keyword arguments — same model as Python, different ergonomics.
T.8Production patterns (TypeScript)
- NodeSDK with explicit
metricReader. Don't try to manage MeterProvider lifecycle separately. - HostMetrics on. Event loop lag is the most valuable single Node metric.
- Views in the SDK constructor for the HTTP histogram buckets and attribute keys.
- Pino with the OTel transport for log correlation (covered in the logs guide). Trace exemplars in dashboards require trace IDs in logs to navigate from a metric spike to a log line.
- SIGTERM shutdown that calls
sdk.shutdown(). Same flushing semantics as tracing — the last interval of metrics gets lost otherwise. - Don't bundle the server. The same instrumentation-loading problem as tracing.
- Cluster-mode applications have the same multi-process concerns as Gunicorn — each cluster worker is a separate Node process and needs a unique
service.instance.id.
The Path to Mimir
From your SDK to a query in Grafana, by way of Alloy.
IV.1Wire path
Two transport options into Mimir:
- OTLP/HTTP direct: Mimir exposes
/otlp/v1/metrics. Your Collector or Alloy uses theotlphttpexporter pointed at it. This is the recommended path for OTel-native deployments. - Prometheus Remote Write: the older path. The Collector uses
prometheusremotewriteexporter, Mimir's/api/v1/pushendpoint receives it. Slightly different label translation, especially for resource attributes.
For OTLP/HTTP into Mimir, the resource attributes service.namespace, service.name, and service.instance.id become Prometheus labels job (as {namespace}/{name}) and instance. Other resource attributes go into a synthetic metric target_info which you can join against:
promql# Join service.version onto a metric query http_server_request_duration_seconds_count * on (job, instance) group_left (service_version) target_info
IV.1.1Alloy configuration
An Alloy DaemonSet receiving OTLP from local pods and pushing to a central Alloy tier, which then forwards to Mimir:
alloyotelcol.receiver.otlp "default" {
grpc { endpoint = "0.0.0.0:4317" }
http { endpoint = "0.0.0.0:4318" }
output {
metrics = [otelcol.processor.batch.default.input]
traces = [otelcol.processor.batch.default.input]
}
}
otelcol.processor.batch "default" {
send_batch_size = 8192
timeout = "5s"
output {
metrics = [otelcol.exporter.otlphttp.mimir.input]
traces = [otelcol.exporter.otlphttp.tempo.input]
}
}
otelcol.exporter.otlphttp "mimir" {
client {
endpoint = "http://mimir.observability.svc.cluster.local:8080/otlp"
}
}
IV.2OTLP-to-Prometheus mapping rules
When OTLP metrics land in Mimir, they undergo conversion to Prometheus' data model. The rules matter for queries:
| OTel concept | Prometheus result |
|---|---|
| Counter (cumulative) | Prometheus counter; _total suffix added. |
| UpDownCounter (cumulative) | Prometheus gauge (despite the SDK calling it a sum). |
| Gauge / ObservableGauge | Prometheus gauge. |
| Histogram (explicit) | Set of _bucket, _count, _sum series. |
| ExponentialHistogram | Prometheus native histogram (one series instead of N buckets). |
Metric name http.server.request.duration | http_server_request_duration_seconds (dots → underscores, unit appended). |
Attribute http.route | Label http_route. |
Resource attribute service.name | Label job (or namespace/name). |
| Resource attribute (other) | Goes into target_info for join. |
_total suffix
This trips up many migration projects. An OTel counter named requests.count becomes Prometheus requests_count_total — the unit gets appended before _total. Best practice: name OTel counters without the count suffix (just requests or requests.served), the translator will produce the expected Prometheus convention.
IV.3Dashboards: RED and USE
Two complementary methodologies dictate what to put on a service dashboard. They overlap; together they cover almost everything.
IV.3.1RED — request-driven
For a request-handling service, three metrics:
- Rate — requests per second, by route.
- Errors — error rate or percentage, by route and status class.
- Duration — latency percentiles, by route.
promql# Rate per route, per second sum by (http_route) ( rate(http_server_request_duration_seconds_count{job="payments-api"}[5m]) ) # Error rate sum by (http_route) ( rate(http_server_request_duration_seconds_count{ job="payments-api", http_response_status_code=~"5.."}[5m]) ) / sum by (http_route) ( rate(http_server_request_duration_seconds_count{job="payments-api"}[5m]) ) # p95 latency histogram_quantile(0.95, sum by (http_route, le) ( rate(http_server_request_duration_seconds_bucket{job="payments-api"}[5m]) ) )
IV.3.2USE — resource-driven
For each resource (CPU, memory, disk, network), three metrics:
- Utilization — how much is being used.
- Saturation — work queued waiting for the resource.
- Errors — error events on the resource.
For a Node service, the most important USE-style metric is nodejs.eventloop.lag — that is the saturation indicator. For Python, process.cpu.utilization plus GC duration captures most of the picture.
IV.3.3Layout
A service dashboard that's useful in an incident:
- Top row, RED: rate, error rate, p50/p95/p99 latency. Exemplars enabled on the latency chart.
- Second row, USE: CPU, memory, event loop lag (Node) or GC pressure (Python).
- Third row, business metrics: the things your service actually does. Payments settled, orders placed, messages processed.
- Fourth row, dependencies: downstream HTTP client latency, DB connection pool utilization, queue depth.
The exemplar dot on the p99 latency chart is the bridge. Click it, jump to a trace, see the slow span, see the offending log line — the whole point of the LGTM stack is that this path is one click long.
V.0Closing notes
Metrics reward discipline in three places. The first is naming and units — get them right via semconv and your dashboards work across services without translation. The second is cardinality — every attribute is a tax, paid in memory at the source and in storage at Mimir, and the tax is exponential, not linear. The third is histogram buckets — defaults are a starting point, never an ending point.
The combination that pays off in practice is: opt into stable HTTP semconv, run cumulative end-to-end into Mimir, use Views aggressively to constrain attributes, set bucket boundaries deliberately for every histogram with knees at your SLO thresholds, and enable exemplars from day one. The result is a metrics surface that is bounded, queryable, and traversable to traces.
What remains is the human discipline: review your top-N highest-cardinality metrics quarterly. The first time you do this you'll find a surprise. The second time you'll find one too. By the fifth time, you'll have a metrics layer that scales with your service rather than against it.