Deploy an ML Model with FastAPI¶
Introduction¶
Build a simple FastAPI service that loads a model artifact, exposes a prediction endpoint, and can be tested with curl.
Before You Start¶
You need Python, a saved model artifact, and a clear request schema. Load the model once at startup and return JSON with prediction and model version.
Project Structure¶
ml-api/
├── app.py
├── models/
│ └── churn.pkl
├── requirements.txt
└── Dockerfile
Step-by-Step Deployment¶
Create the API:
from fastapi import FastAPI
from pydantic import BaseModel
import joblib
MODEL_VERSION = "churn:17"
model = joblib.load("models/churn.pkl")
class Customer(BaseModel):
tenure: int
monthly_charges: float
app = FastAPI()
@app.get("/health")
def health():
return {"status": "ok", "model_version": MODEL_VERSION}
@app.post("/predict")
def predict(customer: Customer):
features = [[customer.tenure, customer.monthly_charges]]
probability = float(model.predict_proba(features)[0][1])
return {"churn_probability": probability, "model_version": MODEL_VERSION}
Install and run locally:
pip install fastapi uvicorn scikit-learn joblib
uvicorn app:app --host 0.0.0.0 --port 8000
Testing the Deployment¶
Test the endpoints:
curl -s http://localhost:8000/health
curl -s http://localhost:8000/predict -H "Content-Type: application/json" -d '{"tenure": 12, "monthly_charges": 89.9}'
{"status":"ok","model_version":"churn:17"}
{"churn_probability":0.73,"model_version":"churn:17"}
Production Considerations¶
Add request validation, structured logs, metrics, model version reporting, resource limits, readiness probes, and a rollback plan before production.
Common Mistakes¶
- Loading the model on every request.
- Returning predictions without model version.
- Accepting unvalidated JSON.
- Hiding exceptions instead of logging them.
Related Guides¶
Summary¶
A reliable model deployment is versioned, testable, observable, and reversible.