When the Pod Is Fine but Nothing Works — Debugging Kubernetes Services and PVCs

June 15, 2026 · 7 min read

DevOps / Backend Engineer

Last time I deliberately broke Pods three ways and watched them fail to start
(CrashLoopBackOff and friends).
This time the Pods are all Running and healthy — and things still don't work.
The bug isn't in the Pod; it's in the plumbing around it: the Service → Endpoints → Pod
forwarding chain, and the PVC → StorageClass → PV binding chain.
The skill here is knowing which link broke without guessing.

Part 1: Service connectivity — `get endpoints` splits the problem in two

When you can't reach a Service, the single most useful first command is not describe pod. It's:

kubectl get endpoints <svc>

Endpoints is the table that says "which Pod IPs does this Service actually forward to." That one command cleaves every connectivity bug into two halves: can't find the Pods (endpoints empty) vs found them but can't reach them (endpoints populated). Each half points somewhere completely different.

Failure 1: selector doesn't match the Pod labels → endpoints empty

kubectl expose deployment app --port=80 --target-port=80 --name=app-bad-selector
kubectl patch svc app-bad-selector -p '{"spec":{"selector":{"app":"WRONG"}}}'

Symptom: can't connect; the Pods are all Running.
Investigation: kubectl get endpoints app-bad-selector → empty (<none>). Then kubectl get svc app-bad-selector -o wide to read the Selector, and compare against kubectl get pods --show-labels.
Cause: the Service selector matches no Pod labels, so the Endpoints controller can't assemble a backend list.
Fix: correct the selector (or the Pod labels) so they line up.
Concept: a Service doesn't know about Pods directly — it only knows labels. The "which Pod IPs" lookup lives in the Endpoints / EndpointSlice object. Wrong selector → that table is empty. Empty endpoints almost always means a selector problem.

Failure 2: targetPort points at a port nothing is listening on → connection refused

kubectl expose deployment app --port=80 --target-port=8080 --name=app-bad-port

Symptom: still can't connect — but it breaks differently than Failure 1.
Investigation: kubectl get endpoints app-bad-port → this time there are IPs (the selector matches, so endpoints list podIP:8080). Then curl it and watch how it fails → connection refused, immediately, not a hang.
Cause: the Service port is fine, but targetPort is 8080 while the container actually listens on 80. kube-proxy DNATs the packet to podIP:8080, nothing is listening there, and the kernel replies with a TCP RST.
Fix: set targetPort to the port the container really listens on.
Concept: endpoints populated + connection refused → it's a targetPort / wrong-container-port problem. "Refused" means the packet reached the Pod and got rejected — so it's not a routing problem, it's knocking on the wrong door.

Failure 3: NetworkPolicy silently drops the packet → timeout

kubectl get netpol -A
kubectl describe netpol <name> -n <ns>

Symptom: endpoints are populated, targetPort is correct, and curl times out (hangs with no response) — not "refused".
Investigation: kubectl get netpol -A to see if any policy selects the source or destination Pod; describe to read its PodSelector / Ingress / Egress.
Cause: a NetworkPolicy is dropping this traffic. Dropped packets get no reply (no RST), which is exactly why you see a timeout instead of a refusal.
Fix: allow the source→destination flow in the relevant policy, or adjust its PodSelector.

Three things about NetworkPolicy that trip people up:

Whitelist flip. A Pod with no policy selecting it is "allow all." The moment any policy selects it — even one that only lists Ingress — that direction flips to "deny all except what's explicitly allowed." A common gotcha is a podSelector: {} default-deny someone added.
Two directions. Either the destination Pod's Ingress doesn't allow the source, or the source Pod's Egress doesn't allow the destination. Either one blocks the flow, so check policies in both namespaces.
The CNI has to enforce it. This is the big one for local labs:

warning

kind's default CNI (kindnet) does not enforce NetworkPolicy. You can kubectl apply a policy, see it in kubectl get netpol, and traffic still flows — because nothing is enforcing it. To actually reproduce Failure 3 you need a CNI that enforces policy, like Calico or Cilium. I verified this by spinning up a separate kind cluster with disableDefaultCNI: true and installing Calico: the same default-deny policy that did nothing under kindnet turned curl into a clean timeout under Calico. Same YAML, different behavior — because the difference was never the policy, it was the CNI.

Service decision tree

Can't reach a Service → ① kubectl get endpoints <svc>
├─ endpoints empty ─────────────► selector doesn't match labels → fix selector
└─ endpoints populated → curl and watch HOW it fails
     ├─ connection refused (instant) ─► wrong targetPort / nothing listening
     └─ timeout (hangs) ─────────────► suspect NetworkPolicy dropping packets

The cheapest signal in the whole tree is refused vs timeout: refused means the packet arrived and was rejected; timeout means it was swallowed on the way.

Part 2: PVC stuck in Pending — read the Events, not the logs

A PVC that never leaves Pending drags its Pod down with it (the Pod stays Pending too, it can't even schedule). There are no logs to read here — the answer is in kubectl describe pvc.

kubectl apply -f - <<'EOF'
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: stuck-pvc
spec:
  accessModes: ["ReadWriteOnce"]
  storageClassName: nonexistent-sc
  resources:
    requests:
      storage: 1Gi
EOF

Symptom: kubectl get pvc stuck-pvc sits at Pending, never Bound.
Investigation: kubectl describe pvc stuck-pvc (read the Events), then kubectl get storageclass to see what actually exists.
Cause (this example): the requested storageClassName: nonexistent-sc doesn't exist, so no provisioner claims the PVC and it binds to nothing.
Fix: use a StorageClass that exists (kubectl get sc — there's usually one marked (default)), or omit storageClassName to take the default.

The root causes worth memorizing as a checklist:

StorageClass doesn't exist — Events say the class can't be found (this example). Fix the name.
Static provisioning, no matching PV — with no dynamic provisioner you must pre-create a PV; if none has capacity ≥ the request with a matching accessMode and storageClassName, the PVC stays Pending. Create a matching PV, or switch to a class that supports dynamic provisioning.
volumeBindingMode: WaitForFirstConsumer — this Pending is normal. The class deliberately waits until a Pod actually uses the PVC before binding, so the volume lands in the right node/zone. Not an error; create a Pod that mounts it and binding happens.

note

PVC Pending and Pod Pending look similar but are diagnosed completely differently. Pod Pending is a scheduling failure (insufficient resources, taints, nodeSelector) — you look at kubectl describe pod Events and the scheduler. PVC Pending is a binding failure — you look at kubectl describe pvc and kubectl get sc. Don't conflate them.

The one table to keep

Symptom	First command	What to read	Likely cause
Service: endpoints empty	`kubectl get endpoints <svc>`	`<none>`	selector ≠ Pod labels
Service: refused (instant)	curl + `kubectl get endpoints`	populated, but RST	wrong targetPort / nothing listening
Service: timeout (hangs)	`kubectl get netpol -A`	matching PodSelector	NetworkPolicy dropping packets
PVC stuck Pending	`kubectl describe pvc <pvc>`	Events + `get sc`	no/wrong StorageClass, no matching PV, or WaitForFirstConsumer

The thread running through both halves: when a Pod is healthy but unreachable or unschedulable, stop staring at the Pod. Walk the chain it depends on — Endpoints for networking, StorageClass and PV for storage — and let how it fails (empty vs refused vs timeout vs Pending) tell you which link to fix.

Part 1: Service connectivity — get endpoints splits the problem in two​

Failure 1: selector doesn't match the Pod labels → endpoints empty​

Failure 2: targetPort points at a port nothing is listening on → connection refused​

Failure 3: NetworkPolicy silently drops the packet → timeout​

Service decision tree​

Part 2: PVC stuck in Pending — read the Events, not the logs​

The one table to keep​