OpenShift Troubleshooting Checklist¶
Introduction¶
Use this checklist when an OpenShift issue is not yet isolated to pods, routes, storage, RBAC, builds, or cluster operators. Start broad, then narrow the failure to the object and namespace that owns it.
Symptoms¶
Typical symptoms include failed pods, route errors, denied requests, unhealthy operators, or command errors that repeat after retries.
Common Causes¶
- Starting with random pod restarts before checking cluster operators and events.
- Troubleshooting the wrong project or cluster context.
- Ignoring whether the problem is application-scoped or cluster-scoped.
Step 1: Check the Current Status¶
oc whoami --show-server
oc get clusteroperators
oc get nodes
oc get events -A --sort-by=.lastTimestamp
oc get pods -A --field-selector=status.phase!=Running
Example output:
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE
authentication 4.15.12 True False False 8d
ingress 4.15.12 True False False 8d
monitoring 4.15.12 True False False 8d
Step 2: Inspect Logs and Events¶
oc get co
oc get nodes
oc get events -A --sort-by=.lastTimestamp
oc adm top nodes
Step 3: Verify Configuration¶
Compare the object selectors, service account, image reference, route target, or operator status with the failing symptom. In OpenShift, events often show the exact admission, scheduling, pull, SCC, or route reason.
Step 4: Apply the Fix¶
Apply the smallest targeted fix: correct the selector, update the route or service port, link the pull secret, grant the specific RBAC or SCC permission, or repair the unhealthy operator dependency.
Step 5: Confirm the Problem Is Resolved¶
Run the verification commands again and confirm the status, events, and user-facing test all agree.
Common Mistakes¶
- Starting with random pod restarts before checking cluster operators and events.
- Troubleshooting the wrong project or cluster context.
- Ignoring whether the problem is application-scoped or cluster-scoped.
Quick Checklist¶
- Confirm the API server and current user.
- Check ClusterOperators and nodes.
- Check events across the affected namespace or cluster.
- Inspect the exact failing object.
- Verify after one focused fix.
Related Guides¶
Summary¶
OpenShift Troubleshooting Checklist requires matching the symptom to the OpenShift object that owns it. Use oc status commands, events, logs, and focused verification so the fix is tied to evidence.