Kubernetes for MLOps¶
Introduction¶
Learn how Kubernetes helps deploy ML APIs, scale inference workloads, manage resources, and operate model-serving services.
Before You Start¶
You need a container image, resource requests, health endpoints, and a Service. Kubernetes should run the serving workload; training pipelines may run as Jobs or external workflows.
Project Structure¶
container image -> Deployment -> Service -> Ingress or Route -> monitoring
Step-by-Step Deployment¶
Deployment and Service example:
apiVersion: apps/v1
kind: Deployment
metadata:
name: ml-api
spec:
replicas: 2
selector:
matchLabels:
app: ml-api
template:
metadata:
labels:
app: ml-api
spec:
containers:
- name: api
image: registry.example.com/ml-api:churn-17
ports:
- containerPort: 8000
resources:
requests:
cpu: "250m"
memory: "512Mi"
limits:
cpu: "1"
memory: "1Gi"
---
apiVersion: v1
kind: Service
metadata:
name: ml-api
spec:
selector:
app: ml-api
ports:
- port: 80
targetPort: 8000
Apply it:
kubectl apply -f ml-api.yaml
kubectl rollout status deployment/ml-api
kubectl get pods -l app=ml-api
Testing the Deployment¶
Test from inside the cluster:
kubectl run curl --rm -it --image=curlimages/curl --restart=Never -- curl -s http://ml-api/health
{"status":"ok","model_version":"churn:17"}
Production Considerations¶
Add readiness and liveness probes, autoscaling, PodDisruptionBudgets, logs, metrics, NetworkPolicy, and safe rollout settings before production.
Common Mistakes¶
- Deploying without resource requests.
- Scaling inference before checking model load time and memory.
- Missing readiness probes.
- Not exposing model version in health or metrics.
Related Guides¶
Summary¶
A reliable model deployment is versioned, testable, observable, and reversible.