platform / sre reference

The gRPC Study Guide

From protobuf wire format to the L4 load-balancing trap on Kubernetes and how an Istio sidecar quietly fixes it. Written for engineers who run this stuff in production.

01What gRPC actually is

gRPC is a high-performance, contract-first Remote Procedure Call framework. The mental model is the key starting point: instead of thinking in terms of resources and verbs (REST), you think in terms of methods you call on a remote object. You define a service interface once, and the client calls stub.GetUser(req) as if it were a local function. The framework handles serialization, transport, and the network round-trip.

Three pillars hold it together, and you should be able to recite them:

Protocol Buffers — the Interface Definition Language (IDL) and the binary serialization format. The contract is the source of truth; client and server code are generated from it.
HTTP/2 — the transport. Multiplexing, streaming, header compression, and binary framing are not optional extras; gRPC is designed around them.
Generated code + a runtime — protoc plus a language plugin emits type-safe stubs; the runtime handles channels, flow control, deadlines, and retries.

◆ Why it matters for platform work

gRPC's reliance on a single long-lived HTTP/2 connection is the root cause of nearly every operational surprise you'll hit — load imbalance on Kubernetes, idle-timeout disconnects on streams, and the need for an L7 proxy to balance traffic. Hold that fact; the whole second half of this guide flows from it.

gRPC vs REST — the honest comparison

Dimension	gRPC	REST/JSON
Contract	Strict, compiled from `.proto`	Convention / OpenAPI (optional)
Payload	Binary protobuf (compact, fast)	Text JSON (human-readable, larger)
Transport	HTTP/2 mandatory	HTTP/1.1 or 2
Streaming	First-class, bidirectional	Awkward (SSE, chunked, websockets)
Browser support	Needs gRPC-Web + proxy	Native
Tooling/debug	grpcurl, needs reflection	curl, any browser
Best fit	Internal service-to-service, low latency, polyglot	Public APIs, browser clients

02Protocol Buffers

Protobuf is two things at once: a schema language for describing messages and services, and a wire format for serializing them efficiently. Modern gRPC uses proto3.

protobufsyntax = "proto3";
package user.v1;

service UserService {
  // Unary: one request, one response
  rpc GetUser(GetUserRequest) returns (User);

  // Server streaming: one request, a stream of responses
  rpc ListUsers(ListUsersRequest) returns (stream User);

  // Bidirectional streaming
  rpc Chat(stream ChatMessage) returns (stream ChatMessage);
}

message User {
  string id   = 1;   // field number, NOT a value
  string name = 2;
  repeated string roles = 3;
  int64  created_at_unix = 4;
}

⚠ Field numbers are forever

The integers (= 1, = 2) are the only thing on the wire — field names are never transmitted. This is why renaming a field is safe but changing or reusing a field number is a breaking change. Reserve numbers of deleted fields with reserved 4; so nobody recycles them.

The wire format, briefly

Each field is encoded as a tag (field number + wire type) followed by the value. Integers use varint encoding (small numbers take fewer bytes). Strings/bytes/messages are length-delimited. There are no field names, no whitespace, no quotes — that's why protobuf is dramatically smaller and faster to parse than JSON.

proto3 semantics you must know

Defaults are invisible. A scalar set to its zero value (0, "", false) is not serialized. You cannot, by default, distinguish "unset" from "set to zero."
Use optional (re-introduced in proto3) when you genuinely need presence semantics — it adds a hidden has-bit.
Backwards compatibility: add fields freely; never change types or numbers; treat unknown fields as pass-through.

03HTTP/2 — the part everyone skips

You cannot reason about gRPC in production without understanding what's happening at the HTTP/2 layer. Every gRPC call is one HTTP/2 stream, and many streams ride a single TCP connection.

Many concurrent RPCs (streams) multiplexed over one connection — the source of both gRPC's efficiency and its load-balancing headaches.

The anatomy of a unary call on the wire

HEADERS frame — carries :method: POST, :path: /user.v1.UserService/GetUser, content-type: application/grpc, plus any custom metadata and the grpc-timeout.
DATA frame(s) — the message, length-prefixed: 1 byte compression flag + 4 bytes big-endian length + the protobuf bytes.
Trailing HEADERS (trailers) — grpc-status and grpc-message. This is critical: the actual success/failure of a gRPC call lives in HTTP/2 trailers, sent after the body.

◆ Why trailers break naive proxies

Because gRPC reports its status in trailers, any proxy or load balancer that doesn't fully understand HTTP/2 trailers can't tell a successful call from a failed one — and can't make smart retry decisions. This is exactly why Envoy (and therefore Istio) is so well-suited to gRPC and why an L4 TCP load balancer is blind to it.

HTTP/2 features gRPC depends on

Multiplexing — concurrent streams without head-of-line blocking at the HTTP layer.
Flow control — per-stream and per-connection windows; controls backpressure (relevant when tuning streaming throughput).
HPACK — header compression, so repeated metadata is cheap.
GOAWAY — graceful connection shutdown; the server tells clients to stop opening new streams (important for rolling deploys).

04The four RPC types

Type	Shape	Use case
Unary	`req → resp`	The default. CRUD, queries, commands.
Server streaming	`req → stream resp`	Subscriptions, large result sets, progress feeds.
Client streaming	`stream req → resp`	Uploads, telemetry ingestion, batch aggregation.
Bidirectional	`stream req ↔ stream resp`	Chat, interactive sessions, real-time sync.

⚠ Streaming is operationally heavier than it looks

A streaming RPC is a long-lived stream that can sit open for minutes or hours. Every intermediary — Envoy sidecar, k8s Service, cloud LB — has idle/max-connection timeouts that can silently kill it. Plan for reconnection logic and tune timeouts (covered in §20).

05Channels, stubs & the call lifecycle

Channel — the client's virtual connection to a logical server (a target string like dns:///user-svc:50051). A channel manages one or more real subchannels (TCP connections), name resolution, and load-balancing state. Channels are expensive; create one and reuse it.
Stub — the generated, type-safe client bound to a channel. Cheap to create.
Subchannel — a connection to a single backend address, with its own connectivity state (IDLE → CONNECTING → READY → TRANSIENT_FAILURE).

go// Create ONE channel, reuse across the process.
conn, err := grpc.NewClient(
    "dns:///user-svc.default.svc.cluster.local:50051",
    grpc.WithTransportCredentials(insecure.NewCredentials()),
    grpc.WithDefaultServiceConfig(`{"loadBalancingConfig":[{"round_robin":{}}]}`),
)
defer conn.Close()

client := userv1.NewUserServiceClient(conn)
resp, err := client.GetUser(ctx, &userv1.GetUserRequest{Id: "42"})

06Metadata, deadlines & cancellation

Metadata

Key-value pairs sent as HTTP/2 headers (request) or trailers (response). Used for auth tokens, request IDs, tracing context. Binary values use a -bin suffix on the key.

Deadlines — not timeouts

gRPC strongly favors deadlines (an absolute point in time) over timeouts (a duration). The client encodes the remaining time as a grpc-timeout header. Crucially, deadlines propagate: if Service A calls B with a 2s deadline and B calls C, C inherits the remaining budget. This prevents work being done on requests the caller has already given up on.

◆ Always set a deadline

A gRPC call with no deadline can hang forever, pinning resources. The single most common production reliability fix is "every outbound RPC gets a context deadline." Propagate the incoming deadline downstream; don't reset it.

Cancellation

If the client cancels (or the deadline expires), gRPC sends an RST_STREAM and the server's context is cancelled — well-behaved servers stop work immediately. This is cooperative; your handlers must check ctx.Done().

07Status codes & the error model

gRPC has its own status code space — 17 codes, distinct from HTTP status codes. Knowing the retry-relevant ones cold is essential for configuring meshes and clients.

Code	#	Meaning	Retry?
`OK`	0	Success	—
`CANCELLED`	1	Client cancelled	No
`INVALID_ARGUMENT`	3	Bad request (client's fault)	No
`DEADLINE_EXCEEDED`	4	Ran out of time	Sometimes
`NOT_FOUND`	5	Resource missing	No
`PERMISSION_DENIED`	7	Authz failed	No
`RESOURCE_EXHAUSTED`	8	Quota / rate limit	Yes (backoff)
`FAILED_PRECONDITION`	9	State invalid for op	No
`UNIMPLEMENTED`	12	Method not found	No
`INTERNAL`	13	Server bug / invariant broken	Maybe
`UNAVAILABLE`	14	Transient — server down/restarting	Yes
`UNAUTHENTICATED`	16	No/invalid credentials	No

▸ The idempotency rule

UNAVAILABLE is the canonical "safe to retry" code — it means the request likely never reached application logic. Only auto-retry non-idempotent methods if you're confident the server didn't process them. google.rpc.Status + error details lets you attach structured, typed error payloads beyond the bare code.

08Interceptors

Interceptors are gRPC's middleware. They wrap every call, on both client and server sides, for unary and streaming. This is where you put cross-cutting concerns: auth, logging, metrics, tracing, panic recovery, rate limiting.

go// Server-side unary interceptor: log + inject metrics
func MetricsInterceptor(
    ctx context.Context, req any,
    info *grpc.UnaryServerInfo, handler grpc.UnaryHandler,
) (any, error) {
    start := time.Now()
    resp, err := handler(ctx, req)        // call the actual RPC
    code := status.Code(err)
    rpcLatency.WithLabelValues(info.FullMethod, code.String()).
        Observe(time.Since(start).Seconds())
    return resp, err
}

▸ Relevance to your stack

This is exactly where OpenTelemetry instrumentation hooks in — an OTel interceptor emits spans and gen_ai.* / standard RPC metrics per call, which Alloy can scrape or receive via OTLP and route to Mimir/Tempo. Interceptor-level metrics give you per-method latency and status-code breakdowns without touching business code.

09Name resolution & load balancing

This section is the bridge to the Kubernetes story. gRPC's LB model is client-side by default, and that design decision is what collides with how Kubernetes Services work.

Name resolution

The channel target uses a scheme: dns:///host:port, passthrough:///ip:port, or xds:///. The DNS resolver resolves the host to a set of addresses and watches for changes. The trailing details matter:

dns:///user-svc:50051 — resolve user-svc via DNS, get back all A records.
For this to return all pod IPs in Kubernetes, you need a headless Service (clusterIP: None) — a normal ClusterIP Service returns a single virtual IP.

Load-balancing policies

pick_first (default) — connect to the first working address; all traffic to one backend.
round_robin — open a subchannel to every resolved address and rotate. This is what you want for client-side LB.
Look-aside / xDS — an external control plane (e.g. via the gRPC xDS API, the same one Envoy uses) hands out endpoints and policy.

Client-side load balancing: the client itself maintains a connection to every backend and rotates requests.

10Retries, keepalive & health checking

Retries & hedging (service config)

gRPC supports declarative retry policy via the channel's service config (JSON) — max attempts, backoff, and which status codes are retryable. Hedging sends the same request to multiple backends and takes the first response (only for idempotent calls).

json{
  "methodConfig": [{
    "name": [{"service": "user.v1.UserService"}],
    "retryPolicy": {
      "maxAttempts": 4,
      "initialBackoff": "0.1s",
      "maxBackoff": "1s",
      "backoffMultiplier": 2,
      "retryableStatusCodes": ["UNAVAILABLE"]
    }
  }]
}

Keepalive

gRPC sends HTTP/2 PING frames to detect dead connections and to keep NAT/firewall mappings alive. Misconfigured keepalive is a frequent cause of GOAWAY: too_many_pings errors — the client pings more aggressively than the server's PermitWithoutStream / min-time policy allows.

Health checking

The standard grpc.health.v1.Health/Check service returns SERVING / NOT_SERVING. This is what Kubernetes probes and load balancers query (see §14). Reflection (grpc.reflection.v1) lets tools like grpcurl discover services without the .proto file — invaluable for debugging, but consider disabling it in production.

11Security & auth

Channel credentials — TLS/mTLS at the transport. In a mesh you typically delegate this to the sidecar (the app speaks plaintext to its local Envoy, Envoy does mTLS on the wire).
Call credentials — per-RPC tokens (OAuth2 / JWT bearer tokens) carried in metadata. This composes with channel creds.
ALTS — Google's mutual-auth scheme, mostly GCP-internal.

▸ Auth in an OAuth2/Keycloak world

A common pattern: the caller obtains a token (Authorization Code + PKCE for users, client-credentials for service-to-service), attaches it as authorization: Bearer … metadata, and a server interceptor validates it. Coarse network identity (who can talk to whom) is enforced by the mesh via mTLS + AuthorizationPolicy; fine-grained scope/role checks happen in the interceptor.

12Kubernetes: the L4 load-balancing trap

This is the canonical gRPC-on-Kubernetes problem, and it follows directly from §03 and §09. Here is the failure mode in one breath:

✕ The problem

A standard Kubernetes ClusterIP Service load-balances at L4 (connection level) via kube-proxy/iptables/IPVS. It picks a backend pod once, when the TCP connection is established. But gRPC opens one long-lived HTTP/2 connection and multiplexes thousands of requests over it. Result: every request goes to the single pod that won the connection lottery. New pods you scale up receive zero traffic until existing connections churn.

The trap: one sticky HTTP/2 connection pinned to one pod. Scaling out does nothing for an existing client.

13Fixing load balancing without a mesh

There are three families of fix. Pick based on whether you have a mesh.

Option A — Headless Service + client-side LB

Make the Service headless so DNS returns all pod IPs, then configure the gRPC client to use round_robin. The client now holds a subchannel to every pod and balances per-request.

yamlapiVersion: v1
kind: Service
metadata:
  name: user-svc
spec:
  clusterIP: None          # ← headless: DNS returns every pod IP
  selector:
    app: user
  ports:
    - name: grpc           # name the port (matters for Istio too)
      port: 50051
      appProtocol: grpc      # explicit protocol hint

Client target: dns:///user-svc.default.svc.cluster.local:50051 with round_robin. The catch: the DNS resolver only re-resolves periodically, so newly added pods aren't picked up instantly — tune the resolver refresh, or rely on the mesh instead.

Option B — A look-aside / xDS balancer

Run a control plane that feeds endpoints to clients over the gRPC xDS API. This is powerful but operationally heavy if you don't already run such infrastructure.

Option C — An L7 proxy / service mesh

Put an HTTP/2-aware L7 proxy (Envoy) in the path. It terminates the client's connection and does per-request balancing to backends. This is the Istio answer, and it's why a mesh makes the whole problem disappear — see Part III.

◆ Decision rule

If you run Istio/Linkerd: do nothing special — use a normal ClusterIP Service and let the sidecar balance. If you don't run a mesh: headless Service + round_robin is the pragmatic fix.

14Health probes

An HTTP probe can't check a gRPC-only server. Two correct approaches:

Native gRPC probes (Kubernetes 1.24+, GA)

yamlreadinessProbe:
  grpc:
    port: 50051
    service: "user.v1.UserService"   # optional; checks Health/Check
  initialDelaySeconds: 5
  periodSeconds: 10

The kubelet calls the standard grpc.health.v1.Health/Check service. The server must implement and register it.

Legacy: `grpc_health_probe` binary

Before 1.24, you shipped the grpc_health_probe binary in the image and used an exec probe. Still seen in older charts.

⚠ Readiness drives endpoint membership

Readiness gates whether a pod is in the Service's EndpointSlice. A pod failing readiness is pulled from the set of resolvable IPs — which is exactly how rolling deploys avoid sending gRPC traffic to a pod that isn't ready. Make your Health service reflect real readiness (deps connected, caches warm), not just "process is up."

15Istio: how Envoy handles gRPC

Istio injects an Envoy sidecar next to each pod. All inbound/outbound traffic is transparently redirected through it (via iptables or the newer ambient mode). Envoy is a full L7, HTTP/2-native proxy — it parses gRPC framing, understands trailers, reads grpc-status, and balances per request.

The sidecar terminates the app's HTTP/2 connection and re-balances each multiplexed request — no headless Service needed.

◆ The payoff

With Istio, the L4 trap from §12 is solved automatically. Use a normal ClusterIP Service; the app can keep a single connection to its sidecar, and Envoy spreads the requests. You generally do not want headless + client-side round_robin and a mesh — pick one balancing layer.

16Protocol detection — get this right

For Envoy to apply HTTP/2-aware (L7) handling, Istio must know the port speaks gRPC. If it thinks the port is plain TCP, you fall back to L4 balancing — the very problem you're trying to avoid.

Two ways to declare it, in order of preference:

appProtocol: grpc on the Service port — the explicit, modern, unambiguous way.
Port name prefix — name the port grpc, grpc-web, or grpc-anything. Istio reads the prefix.

yamlports:
  - name: grpc-user        # prefix recognised by Istio
    port: 50051
    appProtocol: grpc        # preferred explicit signal

⚠ Don't rely on automatic protocol detection

Istio can sniff the protocol, but for gRPC it's strongly recommended to declare it explicitly. Auto-detection adds latency on first bytes and can misclassify, silently downgrading you to L4. An unnamed or mislabelled port is the #1 cause of "my gRPC still isn't load-balancing under Istio."

17Traffic management for gRPC

Because Envoy understands gRPC, VirtualService and DestinationRule work at the method level. Routing can match on the gRPC path (/package.Service/Method) and on metadata headers.

yamlapiVersion: networking.istio.io/v1
kind: VirtualService
metadata: { name: user-svc }
spec:
  hosts: [ user-svc ]
  http:                        # gRPC is HTTP/2 → use the http block
    - match:
        - uri:
            prefix: "/user.v1.UserService/"
      route:
        - destination: { host: user-svc, subset: v2 }
      timeout: 2s

▸ gRPC routes live under http:

There is no separate grpc: block — because gRPC is HTTP/2, you configure it under http:. Match on uri.prefix to target a whole service, or the full path to target one method. Canary, A/B, and header-based routing all work this way.

18Retries, outlier detection & connection pools

gRPC-aware retries

Istio retries can key off gRPC status codes, not just HTTP codes. The retryOn field accepts gRPC conditions like cancelled, deadline-exceeded, internal, resource-exhausted, and unavailable.

yaml  http:
    - route: [ { destination: { host: user-svc } } ]
      retries:
        attempts: 3
        perTryTimeout: 1s
        retryOn: "unavailable,deadline-exceeded,resource-exhausted"

Outlier detection (passive health)

In a DestinationRule, outlier detection ejects misbehaving pods. For gRPC, Envoy maps certain statuses (e.g. UNAVAILABLE, INTERNAL) into the 5xx bucket that consecutive5xxErrors counts.

yamlapiVersion: networking.istio.io/v1
kind: DestinationRule
metadata: { name: user-svc }
spec:
  host: user-svc
  trafficPolicy:
    loadBalancer: { simple: ROUND_ROBIN }
    connectionPool:
      http:
        http2MaxRequests: 1000          # max concurrent streams to backend
        maxRequestsPerConnection: 0     # 0 = unlimited (no forced cycling)
    outlierDetection:
      consecutive5xxErrors: 5
      interval: 10s
      baseEjectionTime: 30s
      maxEjectionPercent: 50

⚠ maxRequestsPerConnection and gRPC

Setting this to a low value forces Envoy to cycle HTTP/2 connections, which can interrupt in-flight streaming RPCs. Leave it at 0 for streaming-heavy services unless you have a specific reason. Conversely, http2MaxRequests caps concurrency to a backend — too low and you'll see queued/rejected requests under load.

19mTLS & telemetry

Automatic mTLS

Because gRPC is just HTTP/2 to Envoy, Istio's automatic mutual TLS applies with zero app changes. The app speaks plaintext HTTP/2 to its sidecar; the sidecars negotiate mTLS on the wire using SPIFFE identities. Enforce with PeerAuthentication (set mode STRICT) and authorize peer-to-peer access with AuthorizationPolicy — which can match on the gRPC method path under operation.paths.

yamlapiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata: { name: user-svc-rbac }
spec:
  selector: { matchLabels: { app: user } }
  action: ALLOW
  rules:
    - from: [ { source: { principals: [ "cluster.local/ns/web/sa/frontend" ] } } ]
      to:   [ { operation: { paths: [ "/user.v1.UserService/GetUser" ] } } ]

Telemetry

Envoy emits per-call gRPC telemetry without app instrumentation: request counts, durations, and the gRPC response status as a dimension. These flow into your metrics backend (Prometheus/Mimir) as istio_requests_total with a grpc_response_status label, and Envoy propagates tracing headers (B3 / W3C) so spans stitch across hops into Tempo.

▸ Two layers of signal, deliberately

The mesh gives you uniform transport-level gRPC metrics for every service for free. Application interceptors (§08) give you semantic detail the proxy can't see — business labels, payload-aware timing, OTel gen_ai.* attributes. Run both: mesh for the baseline golden signals, interceptors for depth.

20Streaming & the gotchas list

✕ Long-lived streams vs proxy timeouts

Bidirectional and server-streaming RPCs can outlive Envoy's default idle timeout and route timeout. A route-level timeout applies to the whole stream and will kill it — so for streaming routes you usually set timeout: 0s (disabled) and rely on idle timeout + application keepalive instead.

The full gotcha checklist

Symptom	Likely cause	Fix
One pod gets all traffic	L4 balancing (port not detected as gRPC)	Set `appProtocol: grpc` / name port `grpc-*`
Streams die after ~5 min	Route timeout or idle timeout	`timeout: 0s` on the route; tune idle timeout; app keepalive
`too_many_pings` GOAWAY	Client keepalive too aggressive	Raise client keepalive interval / server `PermitWithoutStream`
New pods slow to get traffic (no mesh)	DNS resolver refresh interval	Lower resolver refresh, or move to mesh
Retries not firing	Used HTTP codes, not gRPC conditions	`retryOn: "unavailable,..."`
Streaming RPC interrupted on deploy	`maxRequestsPerConnection` low; GOAWAY on rollout	Set to `0`; graceful drain; client reconnect logic
mTLS handshake errors	Port misdetected as TCP, or PeerAuth mismatch	Declare protocol; check `PeerAuthentication` mode

★One-page cheat sheet

The five facts to never forget

1 RPC = 1 HTTP/2 stream; many streams = 1 TCP connection. Everything else follows.
gRPC status lives in HTTP/2 trailers — L4 LBs are blind to it.
Default k8s ClusterIP balances at L4 → all gRPC traffic pins to one pod.
Fix without mesh: headless Service + round_robin. Fix with mesh: nothing — Envoy balances per request.
Always set a deadline; always declare the port as gRPC.

Debugging toolkit

grpcurl introspect & call (needs reflection) · grpc_health_probe probe health · ghz load testing · istioctl proxy-config inspect Envoy clusters/routes · channelz in-process connection state · GRPC_GO_LOG_SEVERITY_LEVEL=info client tracing

Status codes worth memorizing

0 OK3 INVALID_ARGUMENT4 DEADLINE_EXCEEDED8 RESOURCE_EXHAUSTED13 INTERNAL14 UNAVAILABLE — retry16 UNAUTHENTICATED

gRPC Study Guide · Part I Fundamentals · Part II Kubernetes · Part III Istio. A living reference — verify version-specific flags (k8s gRPC probes GA 1.24+, Istio API networking.istio.io/v1) against current docs before relying on them in production.