Troubleshoot Kubernetes Services¶
Introduction¶
This guide explains troubleshoot kubernetes services with practical kubectl commands, realistic output, and production-focused checks. Use this workflow when an application is failing and you need evidence before changing manifests.
Symptoms¶
You may see pods stuck in a waiting state, failed rollouts, 4xx or 5xx responses, missing endpoints, failed probes, denied API calls, or repeated events in the namespace.
Common Causes¶
Common causes include selectors, endpoints, ports, ingress rules, DNS, CNI, and NetworkPolicy. Always confirm with events and logs before editing the workload.
Step 1: Check Current State¶
kubectl get svc,endpoints -A
kubectl describe svc web -n app
Expected output:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/web ClusterIP 10.96.42.10 <none> 80/TCP 2d
Step 2: Inspect Events and Logs¶
kubectl describe svc web -n app
kubectl get ingress -A
Events show scheduler, kubelet, image pull, mount, and probe errors. Previous logs are critical when the container restarts quickly.
Step 3: Verify the Manifest or Runtime Setting¶
kubectl run curl --rm -it --image=curlimages/curl --restart=Never -- curl -I http://web.app.svc.cluster.local
kubectl get pod web-7d9f8c-abcde -n app -o yaml
Check selectors, image names, probes, resource limits, service accounts, volumes, and namespace references.
Step 4: Apply the Fix¶
apiVersion: v1
kind: Service
metadata:
name: web
namespace: app
spec:
selector:
app: web
ports:
- port: 80
targetPort: 8080
Apply only the corrected field, then let the controller reconcile the desired state.
kubectl apply -f manifest.yaml
kubectl rollout status deployment/web -n app
Step 5: Confirm Recovery¶
kubectl get pods -n app
kubectl get events -n app --sort-by=.lastTimestamp
Common Mistakes¶
- Deleting pods before reading the events that explain why they failed.
- Changing probes, resources, images, and RBAC at the same time.
- Troubleshooting only the pod while ignoring the service, PVC, node, or service account.
Quick Checklist¶
- Check pod status and restart count.
- Read describe output and recent events.
- Inspect current and previous container logs.
- Verify dependent objects such as Secrets, ConfigMaps, PVCs, Services, and RBAC.
- Apply one fix and watch the rollout.
Related Guides¶
- kubectl Describe Pod
- kubectl Logs Previous Container
- Troubleshoot Kubernetes Events
- Kubernetes Pod Troubleshooting Checklist
Summary¶
Treat troubleshoot kubernetes services as an evidence-driven debugging task. Events identify the failing layer, logs explain application behavior, and rollout checks prove the fix worked.