The gRPC Study Guide
From protobuf wire format to the L4 load-balancing trap on Kubernetes and how an Istio sidecar quietly fixes it. Written for engineers who run this stuff in production.
01What gRPC actually is
gRPC is a high-performance, contract-first Remote Procedure Call framework. The mental model is the key starting point: instead of thinking in terms of resources and verbs (REST), you think in terms of methods you call on a remote object. You define a service interface once, and the client calls stub.GetUser(req) as if it were a local function. The framework handles serialization, transport, and the network round-trip.
Three pillars hold it together, and you should be able to recite them:
- Protocol Buffers — the Interface Definition Language (IDL) and the binary serialization format. The contract is the source of truth; client and server code are generated from it.
- HTTP/2 — the transport. Multiplexing, streaming, header compression, and binary framing are not optional extras; gRPC is designed around them.
- Generated code + a runtime —
protocplus a language plugin emits type-safe stubs; the runtime handles channels, flow control, deadlines, and retries.
gRPC's reliance on a single long-lived HTTP/2 connection is the root cause of nearly every operational surprise you'll hit — load imbalance on Kubernetes, idle-timeout disconnects on streams, and the need for an L7 proxy to balance traffic. Hold that fact; the whole second half of this guide flows from it.
gRPC vs REST — the honest comparison
| Dimension | gRPC | REST/JSON |
|---|---|---|
| Contract | Strict, compiled from .proto | Convention / OpenAPI (optional) |
| Payload | Binary protobuf (compact, fast) | Text JSON (human-readable, larger) |
| Transport | HTTP/2 mandatory | HTTP/1.1 or 2 |
| Streaming | First-class, bidirectional | Awkward (SSE, chunked, websockets) |
| Browser support | Needs gRPC-Web + proxy | Native |
| Tooling/debug | grpcurl, needs reflection | curl, any browser |
| Best fit | Internal service-to-service, low latency, polyglot | Public APIs, browser clients |
02Protocol Buffers
Protobuf is two things at once: a schema language for describing messages and services, and a wire format for serializing them efficiently. Modern gRPC uses proto3.
protobufsyntax = "proto3";
package user.v1;
service UserService {
// Unary: one request, one response
rpc GetUser(GetUserRequest) returns (User);
// Server streaming: one request, a stream of responses
rpc ListUsers(ListUsersRequest) returns (stream User);
// Bidirectional streaming
rpc Chat(stream ChatMessage) returns (stream ChatMessage);
}
message User {
string id = 1; // field number, NOT a value
string name = 2;
repeated string roles = 3;
int64 created_at_unix = 4;
}
The integers (= 1, = 2) are the only thing on the wire — field names are never transmitted. This is why renaming a field is safe but changing or reusing a field number is a breaking change. Reserve numbers of deleted fields with reserved 4; so nobody recycles them.
The wire format, briefly
Each field is encoded as a tag (field number + wire type) followed by the value. Integers use varint encoding (small numbers take fewer bytes). Strings/bytes/messages are length-delimited. There are no field names, no whitespace, no quotes — that's why protobuf is dramatically smaller and faster to parse than JSON.
proto3 semantics you must know
- Defaults are invisible. A scalar set to its zero value (
0,"",false) is not serialized. You cannot, by default, distinguish "unset" from "set to zero." - Use
optional(re-introduced in proto3) when you genuinely need presence semantics — it adds a hidden has-bit. - Backwards compatibility: add fields freely; never change types or numbers; treat unknown fields as pass-through.
03HTTP/2 — the part everyone skips
You cannot reason about gRPC in production without understanding what's happening at the HTTP/2 layer. Every gRPC call is one HTTP/2 stream, and many streams ride a single TCP connection.
The anatomy of a unary call on the wire
- HEADERS frame — carries
:method: POST,:path: /user.v1.UserService/GetUser,content-type: application/grpc, plus any custom metadata and thegrpc-timeout. - DATA frame(s) — the message, length-prefixed:
1 bytecompression flag +4 bytesbig-endian length + the protobuf bytes. - Trailing HEADERS (trailers) —
grpc-statusandgrpc-message. This is critical: the actual success/failure of a gRPC call lives in HTTP/2 trailers, sent after the body.
Because gRPC reports its status in trailers, any proxy or load balancer that doesn't fully understand HTTP/2 trailers can't tell a successful call from a failed one — and can't make smart retry decisions. This is exactly why Envoy (and therefore Istio) is so well-suited to gRPC and why an L4 TCP load balancer is blind to it.
HTTP/2 features gRPC depends on
- Multiplexing — concurrent streams without head-of-line blocking at the HTTP layer.
- Flow control — per-stream and per-connection windows; controls backpressure (relevant when tuning streaming throughput).
- HPACK — header compression, so repeated metadata is cheap.
- GOAWAY — graceful connection shutdown; the server tells clients to stop opening new streams (important for rolling deploys).
04The four RPC types
| Type | Shape | Use case |
|---|---|---|
| Unary | req → resp | The default. CRUD, queries, commands. |
| Server streaming | req → stream resp | Subscriptions, large result sets, progress feeds. |
| Client streaming | stream req → resp | Uploads, telemetry ingestion, batch aggregation. |
| Bidirectional | stream req ↔ stream resp | Chat, interactive sessions, real-time sync. |
A streaming RPC is a long-lived stream that can sit open for minutes or hours. Every intermediary — Envoy sidecar, k8s Service, cloud LB — has idle/max-connection timeouts that can silently kill it. Plan for reconnection logic and tune timeouts (covered in §20).
05Channels, stubs & the call lifecycle
- Channel — the client's virtual connection to a logical server (a
targetstring likedns:///user-svc:50051). A channel manages one or more real subchannels (TCP connections), name resolution, and load-balancing state. Channels are expensive; create one and reuse it. - Stub — the generated, type-safe client bound to a channel. Cheap to create.
- Subchannel — a connection to a single backend address, with its own connectivity state (
IDLE → CONNECTING → READY → TRANSIENT_FAILURE).
go// Create ONE channel, reuse across the process.
conn, err := grpc.NewClient(
"dns:///user-svc.default.svc.cluster.local:50051",
grpc.WithTransportCredentials(insecure.NewCredentials()),
grpc.WithDefaultServiceConfig(`{"loadBalancingConfig":[{"round_robin":{}}]}`),
)
defer conn.Close()
client := userv1.NewUserServiceClient(conn)
resp, err := client.GetUser(ctx, &userv1.GetUserRequest{Id: "42"})
06Metadata, deadlines & cancellation
Metadata
Key-value pairs sent as HTTP/2 headers (request) or trailers (response). Used for auth tokens, request IDs, tracing context. Binary values use a -bin suffix on the key.
Deadlines — not timeouts
gRPC strongly favors deadlines (an absolute point in time) over timeouts (a duration). The client encodes the remaining time as a grpc-timeout header. Crucially, deadlines propagate: if Service A calls B with a 2s deadline and B calls C, C inherits the remaining budget. This prevents work being done on requests the caller has already given up on.
A gRPC call with no deadline can hang forever, pinning resources. The single most common production reliability fix is "every outbound RPC gets a context deadline." Propagate the incoming deadline downstream; don't reset it.
Cancellation
If the client cancels (or the deadline expires), gRPC sends an RST_STREAM and the server's context is cancelled — well-behaved servers stop work immediately. This is cooperative; your handlers must check ctx.Done().
07Status codes & the error model
gRPC has its own status code space — 17 codes, distinct from HTTP status codes. Knowing the retry-relevant ones cold is essential for configuring meshes and clients.
| Code | # | Meaning | Retry? |
|---|---|---|---|
OK | 0 | Success | — |
CANCELLED | 1 | Client cancelled | No |
INVALID_ARGUMENT | 3 | Bad request (client's fault) | No |
DEADLINE_EXCEEDED | 4 | Ran out of time | Sometimes |
NOT_FOUND | 5 | Resource missing | No |
PERMISSION_DENIED | 7 | Authz failed | No |
RESOURCE_EXHAUSTED | 8 | Quota / rate limit | Yes (backoff) |
FAILED_PRECONDITION | 9 | State invalid for op | No |
UNIMPLEMENTED | 12 | Method not found | No |
INTERNAL | 13 | Server bug / invariant broken | Maybe |
UNAVAILABLE | 14 | Transient — server down/restarting | Yes |
UNAUTHENTICATED | 16 | No/invalid credentials | No |
UNAVAILABLE is the canonical "safe to retry" code — it means the request likely never reached application logic. Only auto-retry non-idempotent methods if you're confident the server didn't process them. google.rpc.Status + error details lets you attach structured, typed error payloads beyond the bare code.
08Interceptors
Interceptors are gRPC's middleware. They wrap every call, on both client and server sides, for unary and streaming. This is where you put cross-cutting concerns: auth, logging, metrics, tracing, panic recovery, rate limiting.
go// Server-side unary interceptor: log + inject metrics
func MetricsInterceptor(
ctx context.Context, req any,
info *grpc.UnaryServerInfo, handler grpc.UnaryHandler,
) (any, error) {
start := time.Now()
resp, err := handler(ctx, req) // call the actual RPC
code := status.Code(err)
rpcLatency.WithLabelValues(info.FullMethod, code.String()).
Observe(time.Since(start).Seconds())
return resp, err
}
This is exactly where OpenTelemetry instrumentation hooks in — an OTel interceptor emits spans and gen_ai.* / standard RPC metrics per call, which Alloy can scrape or receive via OTLP and route to Mimir/Tempo. Interceptor-level metrics give you per-method latency and status-code breakdowns without touching business code.
09Name resolution & load balancing
This section is the bridge to the Kubernetes story. gRPC's LB model is client-side by default, and that design decision is what collides with how Kubernetes Services work.
Name resolution
The channel target uses a scheme: dns:///host:port, passthrough:///ip:port, or xds:///. The DNS resolver resolves the host to a set of addresses and watches for changes. The trailing details matter:
dns:///user-svc:50051— resolveuser-svcvia DNS, get back all A records.- For this to return all pod IPs in Kubernetes, you need a headless Service (
clusterIP: None) — a normal ClusterIP Service returns a single virtual IP.
Load-balancing policies
pick_first(default) — connect to the first working address; all traffic to one backend.round_robin— open a subchannel to every resolved address and rotate. This is what you want for client-side LB.- Look-aside / xDS — an external control plane (e.g. via the gRPC xDS API, the same one Envoy uses) hands out endpoints and policy.
10Retries, keepalive & health checking
Retries & hedging (service config)
gRPC supports declarative retry policy via the channel's service config (JSON) — max attempts, backoff, and which status codes are retryable. Hedging sends the same request to multiple backends and takes the first response (only for idempotent calls).
json{
"methodConfig": [{
"name": [{"service": "user.v1.UserService"}],
"retryPolicy": {
"maxAttempts": 4,
"initialBackoff": "0.1s",
"maxBackoff": "1s",
"backoffMultiplier": 2,
"retryableStatusCodes": ["UNAVAILABLE"]
}
}]
}
Keepalive
gRPC sends HTTP/2 PING frames to detect dead connections and to keep NAT/firewall mappings alive. Misconfigured keepalive is a frequent cause of GOAWAY: too_many_pings errors — the client pings more aggressively than the server's PermitWithoutStream / min-time policy allows.
Health checking
The standard grpc.health.v1.Health/Check service returns SERVING / NOT_SERVING. This is what Kubernetes probes and load balancers query (see §14). Reflection (grpc.reflection.v1) lets tools like grpcurl discover services without the .proto file — invaluable for debugging, but consider disabling it in production.
11Security & auth
- Channel credentials — TLS/mTLS at the transport. In a mesh you typically delegate this to the sidecar (the app speaks plaintext to its local Envoy, Envoy does mTLS on the wire).
- Call credentials — per-RPC tokens (OAuth2 / JWT bearer tokens) carried in metadata. This composes with channel creds.
- ALTS — Google's mutual-auth scheme, mostly GCP-internal.
A common pattern: the caller obtains a token (Authorization Code + PKCE for users, client-credentials for service-to-service), attaches it as authorization: Bearer … metadata, and a server interceptor validates it. Coarse network identity (who can talk to whom) is enforced by the mesh via mTLS + AuthorizationPolicy; fine-grained scope/role checks happen in the interceptor.
12Kubernetes: the L4 load-balancing trap
This is the canonical gRPC-on-Kubernetes problem, and it follows directly from §03 and §09. Here is the failure mode in one breath:
A standard Kubernetes ClusterIP Service load-balances at L4 (connection level) via kube-proxy/iptables/IPVS. It picks a backend pod once, when the TCP connection is established. But gRPC opens one long-lived HTTP/2 connection and multiplexes thousands of requests over it. Result: every request goes to the single pod that won the connection lottery. New pods you scale up receive zero traffic until existing connections churn.
13Fixing load balancing without a mesh
There are three families of fix. Pick based on whether you have a mesh.
Option A — Headless Service + client-side LB
Make the Service headless so DNS returns all pod IPs, then configure the gRPC client to use round_robin. The client now holds a subchannel to every pod and balances per-request.
yamlapiVersion: v1
kind: Service
metadata:
name: user-svc
spec:
clusterIP: None # ← headless: DNS returns every pod IP
selector:
app: user
ports:
- name: grpc # name the port (matters for Istio too)
port: 50051
appProtocol: grpc # explicit protocol hint
Client target: dns:///user-svc.default.svc.cluster.local:50051 with round_robin. The catch: the DNS resolver only re-resolves periodically, so newly added pods aren't picked up instantly — tune the resolver refresh, or rely on the mesh instead.
Option B — A look-aside / xDS balancer
Run a control plane that feeds endpoints to clients over the gRPC xDS API. This is powerful but operationally heavy if you don't already run such infrastructure.
Option C — An L7 proxy / service mesh
Put an HTTP/2-aware L7 proxy (Envoy) in the path. It terminates the client's connection and does per-request balancing to backends. This is the Istio answer, and it's why a mesh makes the whole problem disappear — see Part III.
If you run Istio/Linkerd: do nothing special — use a normal ClusterIP Service and let the sidecar balance. If you don't run a mesh: headless Service + round_robin is the pragmatic fix.
14Health probes
An HTTP probe can't check a gRPC-only server. Two correct approaches:
Native gRPC probes (Kubernetes 1.24+, GA)
yamlreadinessProbe:
grpc:
port: 50051
service: "user.v1.UserService" # optional; checks Health/Check
initialDelaySeconds: 5
periodSeconds: 10
The kubelet calls the standard grpc.health.v1.Health/Check service. The server must implement and register it.
Legacy: grpc_health_probe binary
Before 1.24, you shipped the grpc_health_probe binary in the image and used an exec probe. Still seen in older charts.
Readiness gates whether a pod is in the Service's EndpointSlice. A pod failing readiness is pulled from the set of resolvable IPs — which is exactly how rolling deploys avoid sending gRPC traffic to a pod that isn't ready. Make your Health service reflect real readiness (deps connected, caches warm), not just "process is up."
15Istio: how Envoy handles gRPC
Istio injects an Envoy sidecar next to each pod. All inbound/outbound traffic is transparently redirected through it (via iptables or the newer ambient mode). Envoy is a full L7, HTTP/2-native proxy — it parses gRPC framing, understands trailers, reads grpc-status, and balances per request.
With Istio, the L4 trap from §12 is solved automatically. Use a normal ClusterIP Service; the app can keep a single connection to its sidecar, and Envoy spreads the requests. You generally do not want headless + client-side round_robin and a mesh — pick one balancing layer.
16Protocol detection — get this right
For Envoy to apply HTTP/2-aware (L7) handling, Istio must know the port speaks gRPC. If it thinks the port is plain TCP, you fall back to L4 balancing — the very problem you're trying to avoid.
Two ways to declare it, in order of preference:
appProtocol: grpcon the Service port — the explicit, modern, unambiguous way.- Port name prefix — name the port
grpc,grpc-web, orgrpc-anything. Istio reads the prefix.
yamlports:
- name: grpc-user # prefix recognised by Istio
port: 50051
appProtocol: grpc # preferred explicit signal
Istio can sniff the protocol, but for gRPC it's strongly recommended to declare it explicitly. Auto-detection adds latency on first bytes and can misclassify, silently downgrading you to L4. An unnamed or mislabelled port is the #1 cause of "my gRPC still isn't load-balancing under Istio."
17Traffic management for gRPC
Because Envoy understands gRPC, VirtualService and DestinationRule work at the method level. Routing can match on the gRPC path (/package.Service/Method) and on metadata headers.
yamlapiVersion: networking.istio.io/v1
kind: VirtualService
metadata: { name: user-svc }
spec:
hosts: [ user-svc ]
http: # gRPC is HTTP/2 → use the http block
- match:
- uri:
prefix: "/user.v1.UserService/"
route:
- destination: { host: user-svc, subset: v2 }
timeout: 2s
http:There is no separate grpc: block — because gRPC is HTTP/2, you configure it under http:. Match on uri.prefix to target a whole service, or the full path to target one method. Canary, A/B, and header-based routing all work this way.
18Retries, outlier detection & connection pools
gRPC-aware retries
Istio retries can key off gRPC status codes, not just HTTP codes. The retryOn field accepts gRPC conditions like cancelled, deadline-exceeded, internal, resource-exhausted, and unavailable.
yaml http:
- route: [ { destination: { host: user-svc } } ]
retries:
attempts: 3
perTryTimeout: 1s
retryOn: "unavailable,deadline-exceeded,resource-exhausted"
Outlier detection (passive health)
In a DestinationRule, outlier detection ejects misbehaving pods. For gRPC, Envoy maps certain statuses (e.g. UNAVAILABLE, INTERNAL) into the 5xx bucket that consecutive5xxErrors counts.
yamlapiVersion: networking.istio.io/v1
kind: DestinationRule
metadata: { name: user-svc }
spec:
host: user-svc
trafficPolicy:
loadBalancer: { simple: ROUND_ROBIN }
connectionPool:
http:
http2MaxRequests: 1000 # max concurrent streams to backend
maxRequestsPerConnection: 0 # 0 = unlimited (no forced cycling)
outlierDetection:
consecutive5xxErrors: 5
interval: 10s
baseEjectionTime: 30s
maxEjectionPercent: 50
maxRequestsPerConnection and gRPCSetting this to a low value forces Envoy to cycle HTTP/2 connections, which can interrupt in-flight streaming RPCs. Leave it at 0 for streaming-heavy services unless you have a specific reason. Conversely, http2MaxRequests caps concurrency to a backend — too low and you'll see queued/rejected requests under load.
19mTLS & telemetry
Automatic mTLS
Because gRPC is just HTTP/2 to Envoy, Istio's automatic mutual TLS applies with zero app changes. The app speaks plaintext HTTP/2 to its sidecar; the sidecars negotiate mTLS on the wire using SPIFFE identities. Enforce with PeerAuthentication (set mode STRICT) and authorize peer-to-peer access with AuthorizationPolicy — which can match on the gRPC method path under operation.paths.
yamlapiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata: { name: user-svc-rbac }
spec:
selector: { matchLabels: { app: user } }
action: ALLOW
rules:
- from: [ { source: { principals: [ "cluster.local/ns/web/sa/frontend" ] } } ]
to: [ { operation: { paths: [ "/user.v1.UserService/GetUser" ] } } ]
Telemetry
Envoy emits per-call gRPC telemetry without app instrumentation: request counts, durations, and the gRPC response status as a dimension. These flow into your metrics backend (Prometheus/Mimir) as istio_requests_total with a grpc_response_status label, and Envoy propagates tracing headers (B3 / W3C) so spans stitch across hops into Tempo.
The mesh gives you uniform transport-level gRPC metrics for every service for free. Application interceptors (§08) give you semantic detail the proxy can't see — business labels, payload-aware timing, OTel gen_ai.* attributes. Run both: mesh for the baseline golden signals, interceptors for depth.
20Streaming & the gotchas list
Bidirectional and server-streaming RPCs can outlive Envoy's default idle timeout and route timeout. A route-level timeout applies to the whole stream and will kill it — so for streaming routes you usually set timeout: 0s (disabled) and rely on idle timeout + application keepalive instead.
The full gotcha checklist
| Symptom | Likely cause | Fix |
|---|---|---|
| One pod gets all traffic | L4 balancing (port not detected as gRPC) | Set appProtocol: grpc / name port grpc-* |
| Streams die after ~5 min | Route timeout or idle timeout | timeout: 0s on the route; tune idle timeout; app keepalive |
too_many_pings GOAWAY | Client keepalive too aggressive | Raise client keepalive interval / server PermitWithoutStream |
| New pods slow to get traffic (no mesh) | DNS resolver refresh interval | Lower resolver refresh, or move to mesh |
| Retries not firing | Used HTTP codes, not gRPC conditions | retryOn: "unavailable,..." |
| Streaming RPC interrupted on deploy | maxRequestsPerConnection low; GOAWAY on rollout | Set to 0; graceful drain; client reconnect logic |
| mTLS handshake errors | Port misdetected as TCP, or PeerAuth mismatch | Declare protocol; check PeerAuthentication mode |
★One-page cheat sheet
The five facts to never forget
- 1 RPC = 1 HTTP/2 stream; many streams = 1 TCP connection. Everything else follows.
- gRPC status lives in HTTP/2 trailers — L4 LBs are blind to it.
- Default k8s ClusterIP balances at L4 → all gRPC traffic pins to one pod.
- Fix without mesh: headless Service +
round_robin. Fix with mesh: nothing — Envoy balances per request. - Always set a deadline; always declare the port as gRPC.
Debugging toolkit
grpcurl introspect & call (needs reflection) · grpc_health_probe probe health · ghz load testing · istioctl proxy-config inspect Envoy clusters/routes · channelz in-process connection state · GRPC_GO_LOG_SEVERITY_LEVEL=info client tracing
Status codes worth memorizing
0 OK3 INVALID_ARGUMENT4 DEADLINE_EXCEEDED8 RESOURCE_EXHAUSTED13 INTERNAL14 UNAVAILABLE — retry16 UNAUTHENTICATED
networking.istio.io/v1) against current docs before relying on them in production.