External OTLP traces into Grafana Alloy over gRPC & HTTP
Expose Alloy's otelcol.receiver.otlp through an Istio ingress gateway so that workloads outside the EKS cluster can ship spans on both OTLP/gRPC (:4317) and OTLP/HTTP (:4318) — with the protocol-detection, routing, and TLS subtleties that each transport demands.
The two OTLP transports
OTLP — the OpenTelemetry wire protocol — ships in two flavors, and the entire complexity of this setup comes from the fact that they are different protocols at L7 and therefore need different ingress treatment.
Port 4317
HTTP/2 with protobuf payloads over a long-lived, multiplexed connection. Same beast as any gRPC service — Istio must see it as grpc/http2 or it falls back to opaque TCP. Default endpoint = "0.0.0.0:4317".
Port 4318
Plain HTTP/1.1. Each signal POSTs to a fixed path: traces to /v1/traces, metrics to /v1/metrics, logs to /v1/logs. Payload is protobuf or JSON. Default endpoint = "0.0.0.0:4318".
Per the Alloy reference, a single otelcol.receiver.otlp component runs both servers when you declare both the grpc and http blocks. The gRPC server defaults to 0.0.0.0:4317 and the HTTP server to 0.0.0.0:4318. If a block is omitted, that server isn't started at all.
This guide focuses on traces, but everything here applies identically to metrics and logs — the same receiver, the same two ports. Only the downstream output { traces = [...] } wiring and the HTTP URL paths differ per signal.
End-to-end architecture
An external client (a VM, an app in another cluster, a CI runner) resolves a public DNS name to the ingress gateway's load balancer, sends OTLP over TLS, and the gateway routes to Alloy's receiver Service inside the mesh. From Alloy, spans flow through a batch/tail-sampling pipeline to Tempo.
The key design decision: terminate TLS at the gateway, then re-encrypt to Alloy with mesh mTLS (§07). Alloy itself listens in plaintext inside the pod; Envoy handles the public-facing crypto. This keeps Alloy's config simple and centralizes certificate management at the edge.
Alloy receiver configuration
Declare both servers in one component. With no endpoint override they bind the documented defaults; here we bind explicitly to 0.0.0.0 so the listener is reachable from outside the pod (not just loopback).
otelcol.receiver.otlp "ingest" {
// gRPC server — OTLP/gRPC on 4317 (HTTP/2)
grpc {
endpoint = "0.0.0.0:4317"
max_recv_msg_size = "16MiB" // raise from 4MiB default for big batches
include_metadata = true // keep headers for auth / multitenancy
}
// HTTP server — OTLP/HTTP on 4318
http {
endpoint = "0.0.0.0:4318"
max_request_body_size = "20MiB"
include_metadata = true
// traces_url_path defaults to "/v1/traces" — leave as-is
}
output {
traces = [otelcol.processor.batch.default.input]
}
}
otelcol.processor.batch "default" {
output {
traces = [otelcol.exporter.otlp.tempo.input]
}
}
otelcol.exporter.otlp "tempo" {
client {
endpoint = "tempo-distributor.tempo.svc.cluster.local:4317"
tls { insecure = true } // in-cluster; or use mesh mTLS
}
}
Set include_metadata = true on both servers if you plan to authenticate (§08) or do header-based multitenancy downstream — it propagates incoming connection metadata (e.g. Authorization, X-Scope-OrgID) to consumers. Without it, those headers are dropped at the receiver.
If you want Alloy to do TLS itself instead of terminating at the edge, each block takes a tls sub-block (cert_file, key_file, and client_ca_file for mTLS). We'll generally avoid that and let Istio own TLS — see §07.
Deployment & Service — protocol naming
This is where the gRPC/HTTP distinction first bites. The Service must declare both ports with the correct appProtocol, or Istio will mishandle one of them.
apiVersion: v1
kind: Service
metadata:
name: alloy-otlp
namespace: observability
spec:
selector:
app: alloy
ports:
- name: otlp-grpc # name prefix matters
port: 4317
targetPort: 4317
appProtocol: grpc # → Istio treats as HTTP/2 gRPC
- name: otlp-http
port: 4318
targetPort: 4318
appProtocol: http # → plain HTTP/1.1, routed by path
Naming the 4317 port otlp or tcp, or leaving appProtocol off. Istio then treats gRPC as opaque TCP: you lose per-request load balancing across Alloy replicas, lose retries, and the gateway can't route by HTTP/2 authority. Always appProtocol: grpc on 4317 and appProtocol: http on 4318.
If you deploy Alloy with the official Helm chart, set the equivalent in values.yaml — expose both container ports and ensure the rendered Service carries these port names / appProtocol. Confirm with kubectl get svc alloy-otlp -o yaml after install; charts don't always set appProtocol by default.
The pod needs sidecar injection so it joins the mesh and can receive mTLS from the gateway:
template:
metadata:
labels:
app: alloy
sidecar.istio.io/inject: "true"
spec:
containers:
- name: alloy
image: grafana/alloy:latest
args: ["run", "/etc/alloy/config.alloy"]
ports:
- { name: otlp-grpc, containerPort: 4317 }
- { name: otlp-http, containerPort: 4318 }
The Gateway — two listeners on the edge
The ingress Gateway needs two server entries: one HTTP/2 listener for gRPC on 4317 and one HTTP listener for OTLP/HTTP on 4318. You can split them by port (cleanest) or, with a shared port, by hostname.
apiVersion: networking.istio.io/v1
kind: Gateway
metadata:
name: otlp-gateway
namespace: observability
spec:
selector:
istio: ingressgateway
servers:
# ---- OTLP/gRPC listener ----
- port:
number: 4317
name: otlp-grpc
protocol: GRPC # implies HTTP/2; enables gRPC edge handling
hosts:
- "otlp-grpc.example.com"
tls:
mode: SIMPLE
credentialName: otlp-grpc-cert
# ---- OTLP/HTTP listener ----
- port:
number: 4318
name: otlp-http
protocol: HTTPS # HTTP/1.1 over TLS, terminated here
hosts:
- "otlp-http.example.com"
tls:
mode: SIMPLE
credentialName: otlp-http-cert
Exposing the gateway ports — the four-hop chain
"Open the port on the Service" sounds like one action, but a packet arriving at otlp-grpc.example.com:4317 actually traverses four independent port mappings, and every one of them has to agree or the connection dies silently somewhere in the middle. The Gateway CR you wrote in the previous step only controls the last of those four. Understanding the full chain is what turns "it doesn't connect" from a guessing game into a three-command diagnosis.
| # | Mapping | Owned by | If wrong… |
|---|---|---|---|
| ① | LB listener → target: the cloud LB accepts :4317 and forwards to a target group of pod IPs (or NodePorts). | AWS LB Controller, driven by the Service's type: LoadBalancer + annotations. | Connection refused / timeout from outside; nothing in Envoy logs. |
| ② | Service port: the front-door port the LB listener is generated from. | Service spec.ports[].port. | No LB listener created for 4317; LB silently drops it. |
| ③ | Service targetPort: the pod port the Service sends to. | Service spec.ports[].targetPort. | Traffic reaches the pod on the wrong port → connection reset. |
| ④ | Envoy listener bind: the gateway Envoy actually listen()s on this port. | The Gateway CR's server.port.number, pushed to Envoy over LDS. | NR / no listener; istioctl pc listeners shows nothing on 4317. |
Gateway.server.port.number (hop ④) must equal the Service targetPort (hop ③). That's the join nobody sees in a single YAML file because they live in different resources. The Service port (hop ②) and the LB listener (hop ①) can technically differ from the targetPort, but keep all four numerically identical (4317/4318) unless you have a deliberate reason — every mismatch is a future incident.
Why the Gateway CR alone isn't enough
This trips people up: you wrote a Gateway with port.number: 4317, so why isn't it reachable? Because the Gateway CR only programs hop ④ — it tells the gateway pod's Envoy to bind a listener via the Listener Discovery Service. It does nothing to the Kubernetes Service in front of that pod. The Service is a separate object (often created once at install time by Helm or the operator) and it has no idea you added a new Gateway resource. So Envoy is dutifully listening on 4317 inside the pod, but the Service never forwards anything there, and the cloud LB never even creates a listener for it. Envoy logs stay empty, which is exactly why this feels like a phantom failure.
On EKS: how the Service becomes an NLB
With the AWS Load Balancer Controller, a type: LoadBalancer Service annotated for NLB provisions a Network Load Balancer where each Service port becomes one NLB listener + one target group. Two properties matter enormously for OTLP:
- NLB is pure L4. Its listeners are TCP and pass bytes through untouched — which is precisely what you want, because TLS is terminated at Envoy (Gateway
mode: SIMPLE). Do not configure TLS termination on the NLB as well, or you'll either double-encrypt or strip the TLS that Envoy expects to terminate. Leave the listeners asTCP. target-type: ip(the right choice on EKS with the VPC CNI) points the target group directly at pod IPs, skipping the NodePort hop. This preserves the client source IP end-to-end and removes a layer of kube-proxy SNAT. The alternative,instancemode, targets a NodePort (30000–32767) on every node and then relies on kube-proxy to reach the pod — an extra hop and lost source IP unless you also setexternalTrafficPolicy: Local.
apiVersion: v1
kind: Service
metadata:
name: istio-ingressgateway
namespace: istio-system
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: external
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
# health check stays on Istio's status port — see warning below
service.beta.kubernetes.io/aws-load-balancer-healthcheck-port: "15021"
service.beta.kubernetes.io/aws-load-balancer-healthcheck-path: /healthz/ready
spec:
type: LoadBalancer
externalTrafficPolicy: Local # preserve client source IP (IP allowlisting)
selector:
istio: ingressgateway
ports:
- { name: status-port, port: 15021, targetPort: 15021, protocol: TCP }
- { name: https, port: 443, targetPort: 8443, protocol: TCP }
- { name: otlp-grpc, port: 4317, targetPort: 4317, protocol: TCP }
- { name: otlp-http, port: 4318, targetPort: 4318, protocol: TCP }
The default gateway Service carries status-port (15021), http2 (80), and https (443). If you redefine spec.ports and forget 15021, the NLB health check has nothing to probe, marks every target unhealthy, and pulls the entire gateway out of rotation — taking down 443 traffic too. Always add 4317/4318 alongside the existing ports, and keep status-port as the health-check target.
The reconcile trap — edit the source of truth, not the live Service
The most common way this goes wrong operationally: you kubectl patch svc istio-ingressgateway, it works, and then it silently reverts an hour later. That's because the Service is a managed object. If Istio was installed via the operator (IstioOperator CR) or a Helm release, the controller continuously reconciles the live Service back to its declared spec — wiping your manual ports. You must add the ports to whatever owns the Service:
The modern standalone gateway chart exposes a service.ports list:
Classic operator install nests ports under the gateway component:
service:
type: LoadBalancer
ports:
- { name: status-port, port: 15021, targetPort: 15021 }
- { name: https, port: 443, targetPort: 8443 }
- { name: otlp-grpc, port: 4317, targetPort: 4317 }
- { name: otlp-http, port: 4318, targetPort: 4318 }
spec:
components:
ingressGateways:
- name: istio-ingressgateway
enabled: true
k8s:
service:
ports:
- { name: status-port, port: 15021, targetPort: 15021 }
- { name: https, port: 443, targetPort: 8443 }
- { name: otlp-grpc, port: 4317, targetPort: 4317 }
- { name: otlp-http, port: 4318, targetPort: 4318 }
If you deploy the gateway via the istio/gateway chart through pulumi_kubernetes.helm.v3.Release, put the full ports list in the chart values so it's declarative and survives reconcile. Drift-detect on the rendered Service (kubectl get svc -o yaml) in CI — a chart upgrade that resets service.ports to the default is a classic silent regression that drops your OTLP listeners without touching 443.
externalTrafficPolicy & source IP
If you intend to IP-allowlist external senders (a sensible control for an internet-facing trace endpoint), the client source IP must survive the trip. With NLB target-type: ip the source IP is preserved natively. With instance mode you additionally need externalTrafficPolicy: Local, which stops kube-proxy from SNAT-ing and load-balancing across nodes — but then the NLB health check must only succeed on nodes actually running a gateway pod, which the LB Controller handles via the health-check port. The trade-off: Local can create imbalance if gateway pods aren't spread evenly, so pair it with a topology spread or a DaemonSet-style gateway in large clusters.
One cost footnote for EKS: with an NLB and target-type: ip, enabling cross-zone load balancing lets the LB send traffic to gateway pods in any AZ — convenient, but inter-AZ bytes are billed. For a high-volume trace firehose this is non-trivial. If your senders and gateway pods can be AZ-aligned, leaving cross-zone off (or using zone-aware routing) keeps the span traffic intra-AZ. Weigh availability against the transfer bill.
protocol: GRPC is the strict, gRPC-aware alias and is correct for 4317. If you ever multiplex other HTTP/2 traffic on the same listener, use HTTP2 instead. For 4318, HTTPS terminates TLS for an HTTP/1.1 backend; use plain HTTP only if you terminate TLS upstream of Istio (e.g. at an AWS NLB/ALB — which, per the L4 note above, you generally should not do here).
Confirm the whole chain in three commands
# hop ②③ — Service actually carries 4317/4318 with right targetPort
kubectl get svc istio-ingressgateway -n istio-system \
-o jsonpath='{range .spec.ports[*]}{.name}={.port}->{.targetPort}{"\n"}{end}'
# hop ④ — Envoy bound a listener on each port
istioctl proxy-config listeners deploy/istio-ingressgateway -n istio-system \
| grep -E '4317|4318'
# hop ① — the NLB provisioned listeners (AWS side)
aws elbv2 describe-listeners --load-balancer-arn "$NLB_ARN" \
--query 'Listeners[].Port'
If hop ④ shows the listener but hop ②③ is missing the port, your Gateway is fine and the Service is the gap — the single most common version of this failure.
VirtualService routing for each transport
One VirtualService can bind both listeners. The gRPC side routes by host into the :4317 service port; the HTTP side matches the /v1/traces (and optionally /v1/metrics, /v1/logs) URI prefixes into :4318.
apiVersion: networking.istio.io/v1
kind: VirtualService
metadata:
name: alloy-otlp
namespace: observability
spec:
hosts:
- "otlp-grpc.example.com"
- "otlp-http.example.com"
gateways:
- otlp-gateway
http:
# ---- gRPC: match by authority, route to 4317 ----
- match:
- authority:
exact: "otlp-grpc.example.com"
route:
- destination:
host: alloy-otlp.observability.svc.cluster.local
port:
number: 4317
timeout: 30s
# ---- HTTP: match the OTLP signal paths, route to 4318 ----
- match:
- uri: { prefix: "/v1/traces" }
authority: { exact: "otlp-http.example.com" }
- uri: { prefix: "/v1/metrics" }
authority: { exact: "otlp-http.example.com" }
- uri: { prefix: "/v1/logs" }
authority: { exact: "otlp-http.example.com" }
route:
- destination:
host: alloy-otlp.observability.svc.cluster.local
port:
number: 4318
timeout: 30s
http: block
It looks odd, but gRPC is HTTP/2, so Istio routes it through the http: stanza — same as the gRPC guide. The tcp: block is only for opaque L4. Using http: is what unlocks per-request balancing across Alloy replicas and retry/timeout policy for the trace stream.
Add a DestinationRule so traffic to Alloy uses a sane LB policy and mesh mTLS — identical pattern to the gRPC guide:
apiVersion: networking.istio.io/v1
kind: DestinationRule
metadata:
name: alloy-otlp
namespace: observability
spec:
host: alloy-otlp.observability.svc.cluster.local
trafficPolicy:
loadBalancer:
simple: LEAST_REQUEST # spread gRPC streams across Alloy replicas
tls:
mode: ISTIO_MUTUAL # mTLS edge → Alloy
TLS strategy
Three layers of TLS could exist here. Decide which you actually want:
| Segment | Recommended |
|---|---|
| Client → Gateway | TLS terminated at gateway (mode: SIMPLE) with a public cert in credentialName. The simplest, most observable choice. |
| Gateway → Alloy | mTLS via mesh (DestinationRule tls.mode: ISTIO_MUTUAL + PeerAuthentication STRICT). Automatic certs from istiod. |
| Alloy receiver TLS | Off — let the sidecar handle it. Only enable the receiver's own tls{} block if you bypass the mesh. |
Create the edge cert secrets in the gateway's namespace (typically istio-system for the shared ingressgateway, or use a namespace-scoped gateway):
# cert/key from your CA or cert-manager
kubectl create secret tls otlp-grpc-cert \
--cert=otlp-grpc.crt --key=otlp-grpc.key \
-n istio-system
# with cert-manager, prefer a Certificate resource that writes this secret
When using credentialName, the TLS secret must live in the same namespace as the ingress gateway deployment (usually istio-system), not the namespace of your Gateway resource — unless you've enabled credential discovery across namespaces. Mismatched namespace is the #1 cause of no healthy upstream / TLS handshake failures at the edge.
Lock down inbound mTLS to Alloy:
apiVersion: security.istio.io/v1
kind: PeerAuthentication
metadata:
name: alloy-mtls
namespace: observability
spec:
selector:
matchLabels: { app: alloy }
mtls:
mode: STRICT
Authentication for external senders
Anything internet-reachable that accepts spans needs auth, or you'll ingest junk (and pay for it). Two places to enforce it:
Option A — at Alloy (application layer)
Alloy's receiver supports an auth handler from an otelcol.auth.* component on each server block. Basic auth example, straight from the reference:
otelcol.receiver.otlp "ingest" {
grpc { auth = otelcol.auth.basic.creds.handler }
http { auth = otelcol.auth.basic.creds.handler }
output { traces = [otelcol.processor.batch.default.input] }
}
otelcol.auth.basic "creds" {
username = sys.env("OTLP_USERNAME")
password = sys.env("OTLP_PASSWORD")
}
Other handlers exist: otelcol.auth.bearer (static token), otelcol.auth.oauth2, otelcol.auth.headers. Note the receiver requires include_metadata = true for header-based auth to see the credentials.
Option B — at the gateway (recommended for zero-trust)
Push auth to the edge so bad traffic never reaches Alloy. Two common approaches:
- Mutual TLS at the edge — set the Gateway
tls.mode: MUTUALand require client certs. Strong, but you must distribute client certs to senders. - JWT via
RequestAuthentication+AuthorizationPolicy— validate a bearer token issued by your IdP (Keycloak / Entra ID) before forwarding. Works cleanly for OTLP/HTTP; for OTLP/gRPC the token rides in metadata.
Edge mTLS (Gateway MUTUAL) is usually the least-friction strong option for machine-to-machine OTLP, since collectors and SDKs already speak client-cert TLS. Layer Alloy bearer-token auth on top if you need per-tenant identity in the pipeline. Avoid relying on Alloy basic auth alone over the public internet.
Client-side configuration
How an external sender targets each endpoint. The endpoint shape differs by transport — a frequent source of confusion.
No URL path. The endpoint is just host:port. SDKs default the gRPC port to 4317.
Full URL incl. scheme. The SDK appends /v1/traces unless you give the full signal URL.
# --- OTLP/gRPC ---
export OTEL_EXPORTER_OTLP_TRACES_PROTOCOL=grpc
export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=https://otlp-grpc.example.com:4317
# (no path; TLS implied by https scheme)
# --- OTLP/HTTP ---
export OTEL_EXPORTER_OTLP_TRACES_PROTOCOL=http/protobuf
export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=https://otlp-http.example.com:4318/v1/traces
Quick smoke tests with CLI tooling before wiring real apps:
# OTLP/HTTP — POST a protobuf/JSON span payload
curl -v -X POST https://otlp-http.example.com:4318/v1/traces \
-H "Content-Type: application/json" \
-d @trace-payload.json
# OTLP/gRPC — use the otelcol 'telemetrygen' utility
telemetrygen traces --otlp-endpoint otlp-grpc.example.com:4317 \
--otlp-insecure=false --traces 5
Verification
Walk the path from the inside out so you isolate where it breaks.
# 1. Alloy is actually listening on both ports inside the pod
kubectl exec deploy/alloy -n observability -c alloy -- \
sh -c "ss -tlnp | grep -E '4317|4318'"
# 2. Service has correct appProtocol on both ports
kubectl get svc alloy-otlp -n observability -o yaml | grep -A2 appProtocol
# 3. Gateway listeners exist on the edge Envoy
istioctl proxy-config listeners deploy/istio-ingressgateway -n istio-system \
| grep -E '4317|4318'
# 4. Routes resolve to the right cluster
istioctl proxy-config routes deploy/istio-ingressgateway -n istio-system -o json \
| grep -E 'alloy-otlp|v1/traces'
# 5. Endpoints are healthy
istioctl proxy-config endpoints deploy/istio-ingressgateway -n istio-system \
| grep alloy
# 6. Watch Alloy receive spans (built-in debug metrics)
kubectl exec deploy/alloy -n observability -c alloy -- \
wget -qO- localhost:12345/metrics | grep otelcol_receiver_accepted_spans
The decisive signal is the receiver metric otelcol_receiver_accepted_spans_total climbing — that confirms spans made it all the way into the pipeline. If it stays flat while the client thinks it succeeded, the break is between the gateway and Alloy (often mTLS or appProtocol).
Troubleshooting
| Symptom | Likely cause & fix |
|---|---|
gRPC client: UNAVAILABLE / connection reset | 4317 port not named/typed as gRPC. Set appProtocol: grpc on the Service and protocol: GRPC on the Gateway. |
HTTP: 404 on /v1/traces | VirtualService URI prefix doesn't match, or client posting to wrong path. Confirm prefix /v1/traces and that the SDK isn't double-appending the path. |
503 UF / no healthy upstream | mTLS mismatch (PeerAuthentication STRICT but DR not ISTIO_MUTUAL), or Alloy not listening on 0.0.0.0. Check both. |
| TLS handshake fails at edge | credentialName secret missing or in wrong namespace (must be with the ingressgateway, usually istio-system). |
| Spans accepted then dropped | Downstream exporter to Tempo failing — check otelcol_exporter_send_failed_spans and the Tempo distributor endpoint. |
| Large batches rejected | Raise max_recv_msg_size (gRPC) / max_request_body_size (HTTP) on the receiver; defaults are 4MiB / 20MiB. |
| gRPC works, one Alloy replica hammered | Missing/forgotten DestinationRule LB policy. Set LEAST_REQUEST — round-robin counts connections, useless for one gRPC connection. |
Enable ingress access logging and watch the %RESPONSE_FLAGS% field: NR = no route (VS binding / host mismatch), UF = upstream failure (mTLS / port), UH = no healthy upstream (Alloy readiness / endpoint). These pinpoint the failing hop faster than client-side errors.
One-page cheat sheet
| Layer | OTLP/gRPC (4317) | OTLP/HTTP (4318) |
|---|---|---|
| Alloy block | grpc { endpoint = "0.0.0.0:4317" } | http { endpoint = "0.0.0.0:4318" } |
| Service port | appProtocol: grpc | appProtocol: http |
| Gateway protocol | GRPC (HTTP/2) | HTTPS (HTTP/1.1) |
| VS match | by authority host | by uri prefix /v1/traces |
| VS block | http: (gRPC is HTTP/2) | http: |
| Client endpoint | host:4317 (no path) | https://host:4318/v1/traces |
| Default size cap | 4MiB (max_recv_msg_size) | 20MiB (max_request_body_size) |
1. Declare appProtocol on both Service ports — gRPC as grpc, HTTP as http. 2. The default ingressgateway doesn't expose 4317/4318 — patch its Service or nothing reaches Envoy. 3. Terminate TLS at the edge, mTLS to Alloy, and leave the receiver's own tls{} off — one place to manage certs, full mesh observability.