CloudsArk
Troubleshooting Openshift

Fix Node NotReady OpenShift

Learn practical fix node notready openshift with oc commands, OpenShift manifests, verification steps, common mistakes, and production-focused guidance.

Fix Node NotReady OpenShift

Introduction

Node maintenance in OpenShift uses cordon and drain to move workloads before repair or reboot. Always check DaemonSets, local storage, and PodDisruptionBudgets before draining.

Symptoms

Typical symptoms include failed pods, route errors, denied requests, unhealthy operators, or command errors that repeat after retries.

Common Causes

  • Draining without checking PodDisruptionBudgets.
  • Forgetting to uncordon after maintenance.
  • Deleting static or mirror pods manually.

Step 1: Check the Current Status

oc get nodes
oc adm cordon worker-1
oc adm drain worker-1 --ignore-daemonsets --delete-emptydir-data
oc adm uncordon worker-1

Example output:

node/worker-1 cordoned
node/worker-1 drained
node/worker-1 uncordoned

Step 2: Inspect Logs and Events

oc get nodes
oc get pods -A --field-selector spec.nodeName=worker-1
oc get pdb -A

Step 3: Verify Configuration

Compare the object selectors, service account, image reference, route target, or operator status with the failing symptom. In OpenShift, events often show the exact admission, scheduling, pull, SCC, or route reason.

Step 4: Apply the Fix

Apply the smallest targeted fix: correct the selector, update the route or service port, link the pull secret, grant the specific RBAC or SCC permission, or repair the unhealthy operator dependency.

Step 5: Confirm the Problem Is Resolved

Run the verification commands again and confirm the status, events, and user-facing test all agree.

Common Mistakes

  • Draining without checking PodDisruptionBudgets.
  • Forgetting to uncordon after maintenance.
  • Deleting static or mirror pods manually.

Quick Checklist

  • Confirm the active project.
  • Inspect the exact object named in the error.
  • Read recent events.
  • Apply one focused fix.
  • Verify status after the change.

Summary

Fix Node NotReady OpenShift requires matching the symptom to the OpenShift object that owns it. Use oc status commands, events, logs, and focused verification so the fix is tied to evidence.