Envoy Proxy, from listeners to TLS 1.3
A structured walk through Hussein Nasser's Envoy crash course: the architecture and mental model first (downstream/upstream, clusters, listeners, filters, connection pools, threading), then a hands-on build of Envoy as an L7 and L4 proxy with HTTPS, HTTP/2, and a hardened TLS configuration.
Source video: Envoy Proxy Crash Course by Hussein Nasser ·
youtu.be/40gKzHQWgP0
Demo configs referenced in the video live in his javascript_playground repo.
Envoy is the data plane under Istio — every sidecar in your mesh is an Envoy process, and Pilot/istiod simply pushes it configuration over the xDS APIs. Listeners, clusters, routes, and filter chains are exactly the objects Istio generates from your VirtualService, DestinationRule, and AuthorizationPolicy resources. Knowing raw Envoy makes the output of istioctl proxy-config legible.
What is Envoy?
▸ 0:00Envoy is a high-performance, open-source L7 proxy and communication bus for service-oriented architectures. It was originally written in C++ at Lyft to help move the company off a monolith, and is now a CNCF-graduated project that underpins most modern service meshes.
The defining idea is that the network should be transparent to applications. Rather than every microservice re-implementing retries, timeouts, load balancing, TLS, circuit breaking, and observability — once per language, with subtle differences — Envoy runs out of process next to the app and absorbs all of that responsibility.
- Out-of-process / sidecar: Envoy is a separate process. Your app talks to
localhostand Envoy handles the network. This means it works with any language and can be upgraded independently. - L7-aware: it understands HTTP/1.1, HTTP/2, gRPC, and more, so it can route, retry, and observe based on the actual content of requests — not just IP/port.
- Dynamically configurable: the xDS family of APIs (LDS, CDS, EDS, RDS, SDS) lets a control plane push config at runtime with no restarts.
- Observable by design: rich stats, access logs, and distributed tracing are first-class, not bolted on.
Current vs. desired architecture
▸ 0:48The motivating story is the monolith-to-microservices migration. In a monolith, "calling another component" is a function call — there is no network, no partial failure, no per-service retry logic. As you split that monolith into services, every one of those in-process calls becomes a network call, and suddenly each service needs to handle service discovery, load balancing, timeouts, retries, TLS, and metrics.
Doing that inside each application is painful: the logic is duplicated across languages, evolves inconsistently, and ties networking concerns to your business code. The desired architecture pushes all of it down into a uniform proxy layer. Each service gets an Envoy alongside it; the services speak plain local HTTP, and the mesh of Envoys forms a consistent, observable, controllable communication fabric.
Envoy's value proposition is consistency and separation of concerns: networking behavior is defined once, in one place, in one language-agnostic process — instead of N times across your services.
Envoy architecture
▸ 3:00At a high level, traffic flows through Envoy along a fixed pipeline. A client connection lands on a listener, passes through a filter chain that can inspect and act on it, and is finally routed to an endpoint inside a cluster via a connection pool. Holding this shape in your head makes every later config block obvious.
Each of these objects can be supplied statically in a YAML file or dynamically from a management server through the matching xDS API:
Downstream & upstream
▸ 7:30This terminology trips up newcomers, so anchor it firmly — it appears everywhere in the docs and config. Both terms are relative to Envoy itself:
"Upstream" feels backwards because requests flow toward the upstream while responses flow down from it. Don't tie it to request direction — tie it to who initiates the connection. Envoy accepts downstream connections and originates upstream connections.
Clusters
▸ 9:19A cluster is a named group of logically equivalent upstream hosts that Envoy load-balances across — e.g. "all replicas of the checkout service." A cluster owns several responsibilities:
- Service discovery — how the member endpoints are found:
STATIC(hard-coded IPs),STRICT_DNS(resolve a name, every A/AAAA record is an endpoint),LOGICAL_DNS(resolve lazily, keep one connection), orEDS(pushed by a control plane). - Load balancing — round robin, least request, ring hash, Maglev, random.
- Active health checking — probe endpoints and eject unhealthy ones.
- Circuit breaking — cap concurrent connections/requests/retries to protect the upstream.
clusters:
- name: app1_cluster
connect_timeout: 0.25s
type: STRICT_DNS
lb_policy: ROUND_ROBIN
load_assignment:
cluster_name: app1_cluster
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: app1 # resolvable hostname
port_value: 8080
In a mesh, Istio generates one Envoy cluster per Service + subset. istioctl proxy-config clusters <pod> dumps exactly these objects, and a DestinationRule is what sets the lb_policy, circuit breakers, and TLS mode on them.
Listeners
▸ 10:50A listener is a named network location — typically an IP and port (or a Unix domain socket) — where Envoy accepts downstream connections. An Envoy can expose many listeners at once (e.g. :80 for plaintext, :443 for TLS, :9901 for the admin interface).
Each listener carries one or more filter chains. Incoming connections are matched to a chain (by SNI, source IP, ALPN, etc.) and then handed through that chain's filters. The listener is the entry point; the filter chain decides what actually happens to the bytes.
listeners:
- name: http_listener
address:
socket_address:
address: 0.0.0.0
port_value: 80
filter_chains:
- filters: [ # ... network filters go here ... ]
Network filters
▸ 11:50Filters are where Envoy's behavior actually lives. They are arranged in chains and come in three kinds, applied at different depths:
tcp_proxy, and crucially the http_connection_manager.router, CORS, JWT auth, rate limiting, ext_authz.The pivotal one is the HTTP Connection Manager (HCM). It's technically a network (L4) filter, but its job is to translate the raw TCP stream into HTTP messages and then run an L7 HTTP filter chain on them. The L7 chain always terminates in the router filter, which performs the actual route match and forwards to a cluster.
The filter programming model
Network (L4) filters see a connection as a bidirectional stream of bytes, and they come in three flavours depending on which direction they touch:
onNewConnection() and onData(buffer, end_stream).onWrite(buffer, end_stream).tcp_proxy are duplex filters that own the whole connection.Every filter callback returns a filter status that controls the chain. This is the mechanism behind async authorization, buffering, and rate limiting:
continueReading() / continueDecoding() to resume.Every network filter chain must end in a terminal filter — one that actually forwards the connection somewhere (http_connection_manager or tcp_proxy). Filters before it (RBAC, rate limit, a protocol sniffer) inspect/act and then Continue; the terminal filter consumes the stream and never continues past itself.
Filter-chain matching & listener filters
A single listener can hold many filter chains, and FilterChainMatch selects exactly one per incoming connection. This is how one :443 listener serves multiple certificates and routes TLS by hostname at L4, without decrypting. Matching can key on:
- Destination — port and IP/CIDR the connection arrived on.
- SNI (
server_names) and ALPN (application_protocols) — but only if a listener filter extracted them first. - Transport protocol —
tlsvsraw_buffer. - Source — type (local/external), IP/CIDR, and port range.
The criteria are evaluated by a fixed specificity order (destination port → destination IP → server name → transport → application protocol → source), and the most specific match wins. The SNI/ALPN data only exists because listener filters ran first on the raw socket:
ClientHello to extract SNI and ALPN before any chain is chosen.SO_ORIGINAL_DST socket option) — the backbone of transparent/sidecar interception.Catalog of network (L4) filters
Beyond the two stars (HCM and tcp_proxy), Envoy ships protocol-aware L4 filters that give you stats, routing, and fault injection for non-HTTP wire protocols:
HTTP Connection Manager, in depth
The HCM is the most important filter in Envoy — it is what turns a byte-pushing L4 proxy into a request-aware L7 proxy. When a connection matches a chain whose terminal filter is the HCM, the HCM takes ownership and performs a long list of jobs:
- Instantiates a codec. Based on
codec_type(AUTO,HTTP1,HTTP2,HTTP3) it picks how to frame bytes.AUTOuses ALPN (or a magic-byte sniff) to detect the version. Whatever the wire version, the codec normalizes everything into a uniform internal model of headers → data → trailers — which is why an HTTP/1.1 request and an HTTP/2 stream flow through identical filter code. - Demultiplexes streams. An HTTP/2 or HTTP/3 connection carries many concurrent streams; the HCM creates an independent per-stream HTTP filter chain instance for each one.
- Resolves routes. It picks a route configuration (inline, via RDS, or via scoped-RDS), matches a virtual host by the
:authority/Host header, then matches a route within it. - Manages request lifecycle headers. Generates
x-request-id, makes the sampling/tracing decision, manipulatesx-forwarded-for/x-forwarded-proto, addsvia, and enforces header sanitization. - Emits observability. Access logs at stream completion, tracing spans, and the rich
http.<stat_prefix>.*stats family. - Enforces connection & stream timeouts and handles graceful draining (HTTP/2
GOAWAY).
Decoder vs. encoder filters
Inside the HCM, HTTP (L7) filters are split by direction, mirroring the L4 read/write split:
decodeHeaders / decodeData / decodeTrailers.encodeHeaders / encodeData / encodeTrailers.Crucially, decode runs in declared order; encode runs in reverse order. The request descends through filters to the terminal router, which dispatches it upstream; the response then ascends back through the same filters bottom-to-top. Header callbacks return an HeadersStatus — Continue, StopIteration (pause for an async call), or StopAllIterationAndBuffer (pause and buffer the whole body).
Catalog of HTTP (L7) filters
Connection & stream timeouts
The HCM exposes a layered set of timeouts; mixing them up is a frequent source of mysterious resets:
timeout: 0s to disable.common_http_protocol_options) closes an idle connection.gRPC: there is no separate "gRPC connection manager"
Envoy does not have a distinct gRPC network filter. gRPC is just HTTP/2 with a particular framing (length-prefixed protobuf messages) and a reliance on HTTP trailers (grpc-status, grpc-message). So gRPC is handled by the same HTTP Connection Manager running in HTTP/2 mode — plus a set of gRPC-aware HTTP filters and routing features layered on top.
The HCM is a natural gRPC proxy precisely because it already speaks full HTTP/2: long-lived streams, bidirectional flow, and trailers. A gRPC call is an HTTP/2 POST with content-type: application/grpc, a path of /package.Service/Method, length-prefixed message frames in the body, and a final grpc-status trailer. Envoy understands all of this.
gRPC-aware features in the HCM
- Routing: a route
matchcan specifygrpc: {}to match only gRPC traffic, and match on the method via the path prefix/pkg.Service/. - Retries on gRPC status:
retry_policy.retry_onunderstands gRPC semantics —cancelled,deadline-exceeded,resource-exhausted,unavailable,internal— read from the trailinggrpc-status, not the HTTP code (which is almost always 200). - Deadlines: the client's
grpc-timeoutheader can be honoured/propagated as the route timeout viagrpc_timeout_header_max. - Stats: the
grpc_statsfilter emits per-service, per-method, per-status counters and can surfacegrpc-statusin access logs. - Health checking: upstream clusters can use the standard
grpc.health.v1.Healthcheck.
routes:
- match:
prefix: "/helloworld.Greeter/"
grpc: {} # match gRPC requests only
route:
cluster: greeter_cluster
timeout: 0s # disable 15s default for streaming!
retry_policy:
retry_on: "cancelled,deadline-exceeded,resource-exhausted,unavailable"
num_retries: 3
The gRPC translation filters
These are all HTTP filters that sit in the HCM's L7 chain. They exist to bridge clients that can't speak native HTTP/2 gRPC:
google.api.http annotations. Lets plain REST clients hit gRPC services.Streaming gotchas
gRPC has four call modes — unary, server-streaming, client-streaming, and bidirectional. Envoy maps each to an HTTP/2 stream, but long-lived streams expose default behaviours that silently break them:
- Never put a
bufferfilter in front of a stream — buffering waits forend_stream, which for a streaming RPC may be minutes away (or never). - Set
timeout: 0son the route and raise/zerostream_idle_timeout; otherwise the 15s route default and 5m idle default kill the stream. - Enable HTTP/2 keepalive (
connection_keepalivePINGs) on both the listener and upstream cluster so idle bidi streams survive NAT/idle reaping. - Tune flow control —
initial_stream_window_size/initial_connection_window_sizeandmax_concurrent_streamsgovern throughput and fairness for many concurrent streams.
Istio sidecars run the HCM in HTTP/2/gRPC mode automatically once a port is named grpc (or app protocol is detected). VirtualService exposes gRPC route matching and retries.retryOn: "unavailable,deadline-exceeded,...", while gRPC-Web, JSON transcoding, and ext_authz/jwt filters are wired in through EnvoyFilter patches that insert exactly these HTTP filters into the chain. When a long-lived gRPC stream dies at 15s in the mesh, it's this route-timeout default biting — fix it with a per-route timeout, not a cluster setting.
Connection pools
▸ 13:45Opening a TCP (and TLS) connection per request is expensive — handshakes, slow start, certificate negotiation. So Envoy keeps a connection pool per upstream host, per worker thread, and reuses warm connections. The pool's shape depends on the protocol:
- HTTP/1.1: one in-flight request per connection. To get concurrency the pool holds many connections and hands requests out to idle ones.
- HTTP/2: requests are multiplexed as independent streams over a single connection, so a small number of connections (often one) can carry huge concurrency. The pool tracks max concurrent streams.
Pools are per worker thread (see next section). That has a subtle consequence: load-balancing decisions and connection reuse happen independently on each worker, which is why Envoy's LB is statistically even rather than globally coordinated.
Threading model
▸ 18:34Envoy uses a single main thread plus N worker threads (typically one per CPU core). This design is what lets it stay fast and almost entirely lock-free.
The critical rule: a connection is pinned to one worker thread for its entire lifetime. Once a worker accepts a connection, every read, write, filter invocation, and upstream connection for it stays on that thread. No connection is ever handed between threads, so there's no locking on the hot path.
Config can change while traffic flows. Envoy handles this with Thread-Local Storage (TLS): the main thread builds a new immutable config snapshot and posts it to each worker, which swaps it in atomically at a safe point. Workers read their local copy with no locks; that's how you get hot config reloads without dropping connections.
Hands-on: Envoy as an L7 proxy
▸ 21:25 – 39:00 · setup, install, route to 4 backendsThe demo runs four Node.js services and fronts them with one Envoy. On macOS it's installed with brew install envoy (or via the getenvoy.io packages), then run with envoy -c envoy.yaml.
Acting at L7 means using the HTTP Connection Manager with a route configuration: a set of virtual hosts, each holding routes that match on the request (usually a path prefix) and name a target cluster. Below, requests are routed to four backends by path prefix.
filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/...HttpConnectionManager
stat_prefix: ingress_http
http_filters:
- name: envoy.filters.http.router # terminal filter
route_config:
virtual_hosts:
- name: backend
domains: ["*"]
routes:
- match: { prefix: "/app1" }
route: { cluster: app1_cluster }
- match: { prefix: "/app2" }
route: { cluster: app2_cluster }
The match is evaluated top to bottom, first match wins, so order your routes from most specific to least specific.
Splitting load across backends
▸ 40:00There are two distinct ways to spread traffic, and it's worth keeping them separate:
- Within a cluster — multiple endpoints behind one cluster name are balanced automatically by the cluster's
lb_policy. This is plain load balancing across replicas. - Across clusters (traffic splitting) — a single route can fan out to
weighted_clusters, sending a defined percentage to each. This is how you do canaries and A/B rollouts.
routes:
- match: { prefix: "/" }
route:
weighted_clusters:
clusters:
- { name: app1_cluster, weight: 80 }
- { name: app2_cluster, weight: 20 }
An Istio VirtualService with two destination + weight entries compiles to exactly this weighted_clusters block — the foundation of every Istio canary.
Blocking certain requests (/admin)
▸ 45:30Because routing is L7, you can refuse requests by path before they ever touch a backend. Add a route that matches the sensitive prefix and returns a direct_response instead of forwarding:
routes:
- match: { prefix: "/admin" }
direct_response:
status: 403
body: { inline_string: "Forbidden" }
- match: { prefix: "/" } # everything else proceeds
route: { cluster: app1_cluster }
Keep the /admin rule above the catch-all / rule — first match wins, so a broad rule placed first would shadow it.
Envoy as an L4 proxy (TCP router)
▸ 47:50Sometimes you don't want or need HTTP awareness — you just want to forward a raw TCP stream (databases, custom protocols, TLS passthrough). For that, swap the HTTP Connection Manager for the tcp_proxy network filter. There's no route table and no path matching — the whole connection goes to one cluster.
filter_chains:
- filters:
- name: envoy.filters.network.tcp_proxy
typed_config:
"@type": type.googleapis.com/...TcpProxy
stat_prefix: tcp
cluster: app1_cluster
L4 is cheaper and protocol-agnostic, but blind: no path routing, no HTTP retries, no per-request metrics. L7 costs CPU to parse but unlocks content-based routing, retries, header manipulation, and rich observability. Choose L4 for opaque streams, L7 when you need to reason about requests.
Enabling HTTPS / TLS termination
▸ 54:00 DNS · 55:30 Let's EncryptTo serve HTTPS, the video points a DNS record at the host, obtains a certificate from Let's Encrypt, and configures Envoy to terminate TLS — meaning downstream clients connect over TLS, Envoy decrypts, and forwards plaintext to the local backends.
TLS is configured via a transport_socket on the listener's filter chain. The downstream context points at the fullchain certificate and private key:
filter_chains:
- transport_socket:
name: envoy.transport_sockets.tls
typed_config:
"@type": type.googleapis.com/...DownstreamTlsContext
common_tls_context:
tls_certificates:
- certificate_chain: { filename: /etc/letsencrypt/.../fullchain.pem }
private_key: { filename: /etc/letsencrypt/.../privkey.pem }
filters: [ # ... http_connection_manager as before ... ]
Terminating at Envoy is the common case (it also enables L7 features on encrypted traffic). If you instead want Envoy to forward encrypted bytes untouched, that's TLS passthrough — handled at L4 with tcp_proxy and the TLS inspector reading SNI without decrypting.
HTTP/2 and TLS hardening
▸ 1:03:00 HTTP/2 · 1:04:30 TLS 1.2/1.3 only · 1:06:40 SSL LabsEnabling HTTP/2
HTTP/2 is negotiated per hop. On the downstream side, enable it on the HCM; on the upstream side, set it on the cluster. With TLS, the protocol is selected via ALPN, so advertise h2 there too.
# downstream: HCM
http2_protocol_options: {}
# ALPN advertises h2 then http/1.1 fallback
common_tls_context:
alpn_protocols: ["h2", "http/1.1"]
# restrict protocol versions — drop legacy TLS
tls_params:
tls_minimum_protocol_version: TLSv1_2
tls_maximum_protocol_version: TLSv1_3
Disabling TLS 1.0/1.1, allowing only 1.2 & 1.3
Old TLS versions (1.0, 1.1) are deprecated and insecure. Pinning tls_minimum_protocol_version to TLSv1_2 and the maximum to TLSv1_3 refuses anything weaker. You can also restrict cipher suites for an even tighter posture.
Verifying with SSL Labs
The course finishes by running the endpoint through Qualys SSL Labs, which grades the TLS configuration (protocol versions, ciphers, certificate chain, forward secrecy). A clean modern config — TLS 1.2/1.3 only, valid Let's Encrypt chain — lands an A/A+.
HTTP/2 has its own ALPN/cipher requirements. If you enable h2 but forget to advertise it in alpn_protocols, clients silently fall back to HTTP/1.1 over the same port — the connection works, but you never get the multiplexing you configured.
Summary & cheat sheet
▸ 1:07:24The whole course reduces to one mental model and a handful of objects. Internalize the pipeline and the config writes itself:
Self-test
▸ check your recall — answers hiddenQ1Relative to Envoy, what is "downstream" vs "upstream"?
Q2What is a cluster responsible for?
Q3Why is the HTTP Connection Manager described as an L4 filter that does L7 work?
router filter.Q4What guarantees Envoy stays lock-free on the data path?
Q5How does an HTTP/2 connection pool differ from an HTTP/1.1 one?
Q6Two ways to spread traffic, and when to use each?
Q7How do you block /admin at L7, and what ordering pitfall applies?
/admin prefix with a direct_response (e.g. 403). Routes match top-to-bottom, first match wins, so the specific /admin rule must sit above any catch-all / rule.Q8When would you pick the tcp_proxy (L4) filter over the HCM (L7)?
Q9Where is TLS configured, and what two values pin you to modern protocols?
transport_socket on the filter chain. Set tls_minimum_protocol_version: TLSv1_2 and tls_maximum_protocol_version: TLSv1_3 to refuse TLS 1.0/1.1.Q10What silently breaks HTTP/2 even when you've enabled it?
h2 in alpn_protocols on the TLS context — clients then negotiate HTTP/1.1 over the same port and you lose multiplexing without any error.