platform / sre reference

The gRPC Study Guide

From protobuf wire format to the L4 load-balancing trap on Kubernetes and how an Istio sidecar quietly fixes it. Written for engineers who run this stuff in production.

01What gRPC actually is

gRPC is a high-performance, contract-first Remote Procedure Call framework. The mental model is the key starting point: instead of thinking in terms of resources and verbs (REST), you think in terms of methods you call on a remote object. You define a service interface once, and the client calls stub.GetUser(req) as if it were a local function. The framework handles serialization, transport, and the network round-trip.

Three pillars hold it together, and you should be able to recite them:

◆ Why it matters for platform work

gRPC's reliance on a single long-lived HTTP/2 connection is the root cause of nearly every operational surprise you'll hit — load imbalance on Kubernetes, idle-timeout disconnects on streams, and the need for an L7 proxy to balance traffic. Hold that fact; the whole second half of this guide flows from it.

gRPC vs REST — the honest comparison

DimensiongRPCREST/JSON
ContractStrict, compiled from .protoConvention / OpenAPI (optional)
PayloadBinary protobuf (compact, fast)Text JSON (human-readable, larger)
TransportHTTP/2 mandatoryHTTP/1.1 or 2
StreamingFirst-class, bidirectionalAwkward (SSE, chunked, websockets)
Browser supportNeeds gRPC-Web + proxyNative
Tooling/debuggrpcurl, needs reflectioncurl, any browser
Best fitInternal service-to-service, low latency, polyglotPublic APIs, browser clients

02Protocol Buffers

Protobuf is two things at once: a schema language for describing messages and services, and a wire format for serializing them efficiently. Modern gRPC uses proto3.

protobufsyntax = "proto3";
package user.v1;

service UserService {
  // Unary: one request, one response
  rpc GetUser(GetUserRequest) returns (User);

  // Server streaming: one request, a stream of responses
  rpc ListUsers(ListUsersRequest) returns (stream User);

  // Bidirectional streaming
  rpc Chat(stream ChatMessage) returns (stream ChatMessage);
}

message User {
  string id   = 1;   // field number, NOT a value
  string name = 2;
  repeated string roles = 3;
  int64  created_at_unix = 4;
}
⚠ Field numbers are forever

The integers (= 1, = 2) are the only thing on the wire — field names are never transmitted. This is why renaming a field is safe but changing or reusing a field number is a breaking change. Reserve numbers of deleted fields with reserved 4; so nobody recycles them.

The wire format, briefly

Each field is encoded as a tag (field number + wire type) followed by the value. Integers use varint encoding (small numbers take fewer bytes). Strings/bytes/messages are length-delimited. There are no field names, no whitespace, no quotes — that's why protobuf is dramatically smaller and faster to parse than JSON.

proto3 semantics you must know

03HTTP/2 — the part everyone skips

You cannot reason about gRPC in production without understanding what's happening at the HTTP/2 layer. Every gRPC call is one HTTP/2 stream, and many streams ride a single TCP connection.

Client Server 1 TCP / TLS connection stream 1 — GetUser stream 3 — ListUsers (server stream) stream 5 — Chat (bidi)
Many concurrent RPCs (streams) multiplexed over one connection — the source of both gRPC's efficiency and its load-balancing headaches.

The anatomy of a unary call on the wire

  1. HEADERS frame — carries :method: POST, :path: /user.v1.UserService/GetUser, content-type: application/grpc, plus any custom metadata and the grpc-timeout.
  2. DATA frame(s) — the message, length-prefixed: 1 byte compression flag + 4 bytes big-endian length + the protobuf bytes.
  3. Trailing HEADERS (trailers)grpc-status and grpc-message. This is critical: the actual success/failure of a gRPC call lives in HTTP/2 trailers, sent after the body.
◆ Why trailers break naive proxies

Because gRPC reports its status in trailers, any proxy or load balancer that doesn't fully understand HTTP/2 trailers can't tell a successful call from a failed one — and can't make smart retry decisions. This is exactly why Envoy (and therefore Istio) is so well-suited to gRPC and why an L4 TCP load balancer is blind to it.

HTTP/2 features gRPC depends on

04The four RPC types

TypeShapeUse case
Unaryreq → respThe default. CRUD, queries, commands.
Server streamingreq → stream respSubscriptions, large result sets, progress feeds.
Client streamingstream req → respUploads, telemetry ingestion, batch aggregation.
Bidirectionalstream req ↔ stream respChat, interactive sessions, real-time sync.
⚠ Streaming is operationally heavier than it looks

A streaming RPC is a long-lived stream that can sit open for minutes or hours. Every intermediary — Envoy sidecar, k8s Service, cloud LB — has idle/max-connection timeouts that can silently kill it. Plan for reconnection logic and tune timeouts (covered in §20).

05Channels, stubs & the call lifecycle

go// Create ONE channel, reuse across the process.
conn, err := grpc.NewClient(
    "dns:///user-svc.default.svc.cluster.local:50051",
    grpc.WithTransportCredentials(insecure.NewCredentials()),
    grpc.WithDefaultServiceConfig(`{"loadBalancingConfig":[{"round_robin":{}}]}`),
)
defer conn.Close()

client := userv1.NewUserServiceClient(conn)
resp, err := client.GetUser(ctx, &userv1.GetUserRequest{Id: "42"})

06Metadata, deadlines & cancellation

Metadata

Key-value pairs sent as HTTP/2 headers (request) or trailers (response). Used for auth tokens, request IDs, tracing context. Binary values use a -bin suffix on the key.

Deadlines — not timeouts

gRPC strongly favors deadlines (an absolute point in time) over timeouts (a duration). The client encodes the remaining time as a grpc-timeout header. Crucially, deadlines propagate: if Service A calls B with a 2s deadline and B calls C, C inherits the remaining budget. This prevents work being done on requests the caller has already given up on.

◆ Always set a deadline

A gRPC call with no deadline can hang forever, pinning resources. The single most common production reliability fix is "every outbound RPC gets a context deadline." Propagate the incoming deadline downstream; don't reset it.

Cancellation

If the client cancels (or the deadline expires), gRPC sends an RST_STREAM and the server's context is cancelled — well-behaved servers stop work immediately. This is cooperative; your handlers must check ctx.Done().

07Status codes & the error model

gRPC has its own status code space — 17 codes, distinct from HTTP status codes. Knowing the retry-relevant ones cold is essential for configuring meshes and clients.

Code#MeaningRetry?
OK0Success
CANCELLED1Client cancelledNo
INVALID_ARGUMENT3Bad request (client's fault)No
DEADLINE_EXCEEDED4Ran out of timeSometimes
NOT_FOUND5Resource missingNo
PERMISSION_DENIED7Authz failedNo
RESOURCE_EXHAUSTED8Quota / rate limitYes (backoff)
FAILED_PRECONDITION9State invalid for opNo
UNIMPLEMENTED12Method not foundNo
INTERNAL13Server bug / invariant brokenMaybe
UNAVAILABLE14Transient — server down/restartingYes
UNAUTHENTICATED16No/invalid credentialsNo
▸ The idempotency rule

UNAVAILABLE is the canonical "safe to retry" code — it means the request likely never reached application logic. Only auto-retry non-idempotent methods if you're confident the server didn't process them. google.rpc.Status + error details lets you attach structured, typed error payloads beyond the bare code.

08Interceptors

Interceptors are gRPC's middleware. They wrap every call, on both client and server sides, for unary and streaming. This is where you put cross-cutting concerns: auth, logging, metrics, tracing, panic recovery, rate limiting.

go// Server-side unary interceptor: log + inject metrics
func MetricsInterceptor(
    ctx context.Context, req any,
    info *grpc.UnaryServerInfo, handler grpc.UnaryHandler,
) (any, error) {
    start := time.Now()
    resp, err := handler(ctx, req)        // call the actual RPC
    code := status.Code(err)
    rpcLatency.WithLabelValues(info.FullMethod, code.String()).
        Observe(time.Since(start).Seconds())
    return resp, err
}
▸ Relevance to your stack

This is exactly where OpenTelemetry instrumentation hooks in — an OTel interceptor emits spans and gen_ai.* / standard RPC metrics per call, which Alloy can scrape or receive via OTLP and route to Mimir/Tempo. Interceptor-level metrics give you per-method latency and status-code breakdowns without touching business code.

09Name resolution & load balancing

This section is the bridge to the Kubernetes story. gRPC's LB model is client-side by default, and that design decision is what collides with how Kubernetes Services work.

Name resolution

The channel target uses a scheme: dns:///host:port, passthrough:///ip:port, or xds:///. The DNS resolver resolves the host to a set of addresses and watches for changes. The trailing details matter:

Load-balancing policies

gRPC client round_robin pod A pod B pod C one subchannel per pod (headless Service resolves all IPs)
Client-side load balancing: the client itself maintains a connection to every backend and rotates requests.

10Retries, keepalive & health checking

Retries & hedging (service config)

gRPC supports declarative retry policy via the channel's service config (JSON) — max attempts, backoff, and which status codes are retryable. Hedging sends the same request to multiple backends and takes the first response (only for idempotent calls).

json{
  "methodConfig": [{
    "name": [{"service": "user.v1.UserService"}],
    "retryPolicy": {
      "maxAttempts": 4,
      "initialBackoff": "0.1s",
      "maxBackoff": "1s",
      "backoffMultiplier": 2,
      "retryableStatusCodes": ["UNAVAILABLE"]
    }
  }]
}

Keepalive

gRPC sends HTTP/2 PING frames to detect dead connections and to keep NAT/firewall mappings alive. Misconfigured keepalive is a frequent cause of GOAWAY: too_many_pings errors — the client pings more aggressively than the server's PermitWithoutStream / min-time policy allows.

Health checking

The standard grpc.health.v1.Health/Check service returns SERVING / NOT_SERVING. This is what Kubernetes probes and load balancers query (see §14). Reflection (grpc.reflection.v1) lets tools like grpcurl discover services without the .proto file — invaluable for debugging, but consider disabling it in production.

11Security & auth

▸ Auth in an OAuth2/Keycloak world

A common pattern: the caller obtains a token (Authorization Code + PKCE for users, client-credentials for service-to-service), attaches it as authorization: Bearer … metadata, and a server interceptor validates it. Coarse network identity (who can talk to whom) is enforced by the mesh via mTLS + AuthorizationPolicy; fine-grained scope/role checks happen in the interceptor.


12Kubernetes: the L4 load-balancing trap

This is the canonical gRPC-on-Kubernetes problem, and it follows directly from §03 and §09. Here is the failure mode in one breath:

✕ The problem

A standard Kubernetes ClusterIP Service load-balances at L4 (connection level) via kube-proxy/iptables/IPVS. It picks a backend pod once, when the TCP connection is established. But gRPC opens one long-lived HTTP/2 connection and multiplexes thousands of requests over it. Result: every request goes to the single pod that won the connection lottery. New pods you scale up receive zero traffic until existing connections churn.

client 1 conn ClusterIP kube-proxy (L4) pod A — 100% pod B — idle pod C — idle
The trap: one sticky HTTP/2 connection pinned to one pod. Scaling out does nothing for an existing client.

13Fixing load balancing without a mesh

There are three families of fix. Pick based on whether you have a mesh.

Option A — Headless Service + client-side LB

Make the Service headless so DNS returns all pod IPs, then configure the gRPC client to use round_robin. The client now holds a subchannel to every pod and balances per-request.

yamlapiVersion: v1
kind: Service
metadata:
  name: user-svc
spec:
  clusterIP: None          # ← headless: DNS returns every pod IP
  selector:
    app: user
  ports:
    - name: grpc           # name the port (matters for Istio too)
      port: 50051
      appProtocol: grpc      # explicit protocol hint

Client target: dns:///user-svc.default.svc.cluster.local:50051 with round_robin. The catch: the DNS resolver only re-resolves periodically, so newly added pods aren't picked up instantly — tune the resolver refresh, or rely on the mesh instead.

Option B — A look-aside / xDS balancer

Run a control plane that feeds endpoints to clients over the gRPC xDS API. This is powerful but operationally heavy if you don't already run such infrastructure.

Option C — An L7 proxy / service mesh

Put an HTTP/2-aware L7 proxy (Envoy) in the path. It terminates the client's connection and does per-request balancing to backends. This is the Istio answer, and it's why a mesh makes the whole problem disappear — see Part III.

◆ Decision rule

If you run Istio/Linkerd: do nothing special — use a normal ClusterIP Service and let the sidecar balance. If you don't run a mesh: headless Service + round_robin is the pragmatic fix.

14Health probes

An HTTP probe can't check a gRPC-only server. Two correct approaches:

Native gRPC probes (Kubernetes 1.24+, GA)

yamlreadinessProbe:
  grpc:
    port: 50051
    service: "user.v1.UserService"   # optional; checks Health/Check
  initialDelaySeconds: 5
  periodSeconds: 10

The kubelet calls the standard grpc.health.v1.Health/Check service. The server must implement and register it.

Legacy: grpc_health_probe binary

Before 1.24, you shipped the grpc_health_probe binary in the image and used an exec probe. Still seen in older charts.

⚠ Readiness drives endpoint membership

Readiness gates whether a pod is in the Service's EndpointSlice. A pod failing readiness is pulled from the set of resolvable IPs — which is exactly how rolling deploys avoid sending gRPC traffic to a pod that isn't ready. Make your Health service reflect real readiness (deps connected, caches warm), not just "process is up."


15Istio: how Envoy handles gRPC

Istio injects an Envoy sidecar next to each pod. All inbound/outbound traffic is transparently redirected through it (via iptables or the newer ambient mode). Envoy is a full L7, HTTP/2-native proxy — it parses gRPC framing, understands trailers, reads grpc-status, and balances per request.

client pod app envoy pod A — 33% pod B — 33% pod C — 33% Envoy rebalances every individual request across pods
The sidecar terminates the app's HTTP/2 connection and re-balances each multiplexed request — no headless Service needed.
◆ The payoff

With Istio, the L4 trap from §12 is solved automatically. Use a normal ClusterIP Service; the app can keep a single connection to its sidecar, and Envoy spreads the requests. You generally do not want headless + client-side round_robin and a mesh — pick one balancing layer.

16Protocol detection — get this right

For Envoy to apply HTTP/2-aware (L7) handling, Istio must know the port speaks gRPC. If it thinks the port is plain TCP, you fall back to L4 balancing — the very problem you're trying to avoid.

Two ways to declare it, in order of preference:

  1. appProtocol: grpc on the Service port — the explicit, modern, unambiguous way.
  2. Port name prefix — name the port grpc, grpc-web, or grpc-anything. Istio reads the prefix.
yamlports:
  - name: grpc-user        # prefix recognised by Istio
    port: 50051
    appProtocol: grpc        # preferred explicit signal
⚠ Don't rely on automatic protocol detection

Istio can sniff the protocol, but for gRPC it's strongly recommended to declare it explicitly. Auto-detection adds latency on first bytes and can misclassify, silently downgrading you to L4. An unnamed or mislabelled port is the #1 cause of "my gRPC still isn't load-balancing under Istio."

17Traffic management for gRPC

Because Envoy understands gRPC, VirtualService and DestinationRule work at the method level. Routing can match on the gRPC path (/package.Service/Method) and on metadata headers.

yamlapiVersion: networking.istio.io/v1
kind: VirtualService
metadata: { name: user-svc }
spec:
  hosts: [ user-svc ]
  http:                        # gRPC is HTTP/2 → use the http block
    - match:
        - uri:
            prefix: "/user.v1.UserService/"
      route:
        - destination: { host: user-svc, subset: v2 }
      timeout: 2s
▸ gRPC routes live under http:

There is no separate grpc: block — because gRPC is HTTP/2, you configure it under http:. Match on uri.prefix to target a whole service, or the full path to target one method. Canary, A/B, and header-based routing all work this way.

18Retries, outlier detection & connection pools

gRPC-aware retries

Istio retries can key off gRPC status codes, not just HTTP codes. The retryOn field accepts gRPC conditions like cancelled, deadline-exceeded, internal, resource-exhausted, and unavailable.

yaml  http:
    - route: [ { destination: { host: user-svc } } ]
      retries:
        attempts: 3
        perTryTimeout: 1s
        retryOn: "unavailable,deadline-exceeded,resource-exhausted"

Outlier detection (passive health)

In a DestinationRule, outlier detection ejects misbehaving pods. For gRPC, Envoy maps certain statuses (e.g. UNAVAILABLE, INTERNAL) into the 5xx bucket that consecutive5xxErrors counts.

yamlapiVersion: networking.istio.io/v1
kind: DestinationRule
metadata: { name: user-svc }
spec:
  host: user-svc
  trafficPolicy:
    loadBalancer: { simple: ROUND_ROBIN }
    connectionPool:
      http:
        http2MaxRequests: 1000          # max concurrent streams to backend
        maxRequestsPerConnection: 0     # 0 = unlimited (no forced cycling)
    outlierDetection:
      consecutive5xxErrors: 5
      interval: 10s
      baseEjectionTime: 30s
      maxEjectionPercent: 50
maxRequestsPerConnection and gRPC

Setting this to a low value forces Envoy to cycle HTTP/2 connections, which can interrupt in-flight streaming RPCs. Leave it at 0 for streaming-heavy services unless you have a specific reason. Conversely, http2MaxRequests caps concurrency to a backend — too low and you'll see queued/rejected requests under load.

19mTLS & telemetry

Automatic mTLS

Because gRPC is just HTTP/2 to Envoy, Istio's automatic mutual TLS applies with zero app changes. The app speaks plaintext HTTP/2 to its sidecar; the sidecars negotiate mTLS on the wire using SPIFFE identities. Enforce with PeerAuthentication (set mode STRICT) and authorize peer-to-peer access with AuthorizationPolicy — which can match on the gRPC method path under operation.paths.

yamlapiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata: { name: user-svc-rbac }
spec:
  selector: { matchLabels: { app: user } }
  action: ALLOW
  rules:
    - from: [ { source: { principals: [ "cluster.local/ns/web/sa/frontend" ] } } ]
      to:   [ { operation: { paths: [ "/user.v1.UserService/GetUser" ] } } ]

Telemetry

Envoy emits per-call gRPC telemetry without app instrumentation: request counts, durations, and the gRPC response status as a dimension. These flow into your metrics backend (Prometheus/Mimir) as istio_requests_total with a grpc_response_status label, and Envoy propagates tracing headers (B3 / W3C) so spans stitch across hops into Tempo.

▸ Two layers of signal, deliberately

The mesh gives you uniform transport-level gRPC metrics for every service for free. Application interceptors (§08) give you semantic detail the proxy can't see — business labels, payload-aware timing, OTel gen_ai.* attributes. Run both: mesh for the baseline golden signals, interceptors for depth.

20Streaming & the gotchas list

✕ Long-lived streams vs proxy timeouts

Bidirectional and server-streaming RPCs can outlive Envoy's default idle timeout and route timeout. A route-level timeout applies to the whole stream and will kill it — so for streaming routes you usually set timeout: 0s (disabled) and rely on idle timeout + application keepalive instead.

The full gotcha checklist

SymptomLikely causeFix
One pod gets all trafficL4 balancing (port not detected as gRPC)Set appProtocol: grpc / name port grpc-*
Streams die after ~5 minRoute timeout or idle timeouttimeout: 0s on the route; tune idle timeout; app keepalive
too_many_pings GOAWAYClient keepalive too aggressiveRaise client keepalive interval / server PermitWithoutStream
New pods slow to get traffic (no mesh)DNS resolver refresh intervalLower resolver refresh, or move to mesh
Retries not firingUsed HTTP codes, not gRPC conditionsretryOn: "unavailable,..."
Streaming RPC interrupted on deploymaxRequestsPerConnection low; GOAWAY on rolloutSet to 0; graceful drain; client reconnect logic
mTLS handshake errorsPort misdetected as TCP, or PeerAuth mismatchDeclare protocol; check PeerAuthentication mode

One-page cheat sheet

The five facts to never forget

  1. 1 RPC = 1 HTTP/2 stream; many streams = 1 TCP connection. Everything else follows.
  2. gRPC status lives in HTTP/2 trailers — L4 LBs are blind to it.
  3. Default k8s ClusterIP balances at L4 → all gRPC traffic pins to one pod.
  4. Fix without mesh: headless Service + round_robin. Fix with mesh: nothing — Envoy balances per request.
  5. Always set a deadline; always declare the port as gRPC.

Debugging toolkit

grpcurl introspect & call (needs reflection) · grpc_health_probe probe health · ghz load testing · istioctl proxy-config inspect Envoy clusters/routes · channelz in-process connection state · GRPC_GO_LOG_SEVERITY_LEVEL=info client tracing

Status codes worth memorizing

0 OK3 INVALID_ARGUMENT4 DEADLINE_EXCEEDED8 RESOURCE_EXHAUSTED13 INTERNAL14 UNAVAILABLE — retry16 UNAUTHENTICATED

gRPC Study Guide · Part I Fundamentals · Part II Kubernetes · Part III Istio. A living reference — verify version-specific flags (k8s gRPC probes GA 1.24+, Istio API networking.istio.io/v1) against current docs before relying on them in production.