OpenShift Monitoring Stack¶
Introduction¶
OpenShift monitoring runs platform Prometheus, Alertmanager, and related components. Start with pod health, then Alertmanager alerts, targets, and storage pressure.
Why This Matters¶
OpenShift administration relies on operators and cluster-scoped resources. A bad change can affect many projects, so inspect status and events before applying fixes.
Practical Examples¶
oc get pods -n openshift-monitoring
oc get routes -n openshift-monitoring
oc adm top pods -n openshift-monitoring
oc logs statefulset/prometheus-k8s -n openshift-monitoring --tail=50
Example output:
NAME READY STATUS RESTARTS AGE
prometheus-k8s-0 6/6 Running 0 5d
alertmanager-main-0 6/6 Running 0 5d
Verification¶
oc get pods -n openshift-monitoring
oc get pvc -n openshift-monitoring
oc get events -n openshift-monitoring
Troubleshooting¶
Read the operator message, check the namespace where the component runs, inspect related events, and confirm whether the condition is Available, Progressing, or Degraded.
Common Mistakes¶
- Deleting monitoring PVCs to clear space.
- Ignoring persistent volume pressure.
- Assuming user workload monitoring is enabled by default in every cluster.
Quick Checklist¶
- Confirm the active project.
- Inspect the exact object named in the error.
- Read recent events.
- Apply one focused fix.
- Verify status after the change.
Related Guides¶
Summary¶
OpenShift Monitoring Stack is an administration task that should be driven by cluster status, operator conditions, and component logs instead of broad restarts.