Machine learning models only create business value when they’re deployed reliably, monitored continuously, and updated seamlessly. MLOps—the practice of operationalizing machine learning—bridges the gap between data science experiments and production systems. This tutorial walks through building a complete MLOps pipeline using Kubernetes for orchestration and scalability, and MLflow for experiment tracking, model registry, and deployment. You’ll learn not just what to build, but why each component matters and how they work together in production environments.
Architecture Overview: The MLOps Stack
A production MLOps system consists of several interconnected components, each serving a specific purpose in the machine learning lifecycle. Understanding this architecture before diving into implementation prevents building fragmented systems that don’t integrate well.
Kubernetes provides the infrastructure foundation. It orchestrates containers across a cluster, handles scaling, manages secrets, and ensures services remain available despite failures. For MLOps, Kubernetes runs training jobs, serves models, hosts MLflow, and manages the entire ML infrastructure. The declarative nature of Kubernetes configurations means your infrastructure is code-defined, versioned, and reproducible.
MLflow serves three critical functions in the stack. First, MLflow Tracking logs experiments—parameters, metrics, artifacts—creating a searchable history of all model training runs. Second, the Model Registry manages model versions, staging, and promotion to production. Third, MLflow Models provides a standardized format for packaging models with their dependencies, enabling consistent deployment across environments.
The workflow follows a clear path from development to production:
- Data scientists experiment locally or in notebooks, tracking runs with MLflow
- Promising experiments become training jobs deployed as Kubernetes Jobs
- Trained models register in MLflow Model Registry with metadata and artifacts
- Models transition through stages: Staging → Production
- Production models deploy as Kubernetes Deployments for serving
- Monitoring feeds back into the cycle, triggering retraining when needed
This architecture separates concerns—data scientists focus on model development, ML engineers handle deployment infrastructure, and both use consistent tools throughout the lifecycle.
Setting Up the Kubernetes Environment
Before deploying ML workloads, you need a properly configured Kubernetes cluster with appropriate resources and namespaces. This foundation determines how smoothly your MLOps pipeline operates.
Cluster requirements scale with your ML workload. For this tutorial, a cluster with at least 3 nodes (4 CPU, 16GB RAM each) suffices. Production environments often separate training and serving workloads onto different node pools—training nodes with GPUs, serving nodes optimized for inference latency. You can use managed Kubernetes services (EKS, GKE, AKS) or local development with Minikube or Kind for learning.
Create separate namespaces for different concerns:
yaml
apiVersion: v1
kind: Namespace
metadata:
name: mlops-platform
---
apiVersion: v1
kind: Namespace
metadata:
name: ml-training
---
apiVersion: v1
kind: Namespace
metadata:
name: ml-serving
The mlops-platform namespace hosts MLflow and shared services. The ml-training namespace runs training jobs, isolated from serving workloads. The ml-serving namespace contains model serving deployments. This separation provides clear boundaries and enables different resource quotas, security policies, and access controls for each environment.
Storage configuration determines where artifacts persist. MLflow stores model artifacts, training logs, and metadata. Configure a PersistentVolume backed by cloud storage (S3, GCS, Azure Blob) or network storage for production durability:
yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: mlflow-pvc
namespace: mlops-platform
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 100Gi
storageClassName: standard
Secrets management protects sensitive credentials. Store database passwords, cloud credentials, and API keys as Kubernetes Secrets:
bash
kubectl create secret generic mlflow-secrets \
--from-literal=db-password='your-secure-password' \
--from-literal=s3-access-key='your-access-key' \
--from-literal=s3-secret-key='your-secret-key' \
-n mlops-platform
These secrets mount into pods as environment variables or files, never hardcoded in container images or configuration files.
Deploying MLflow on Kubernetes
MLflow provides the central nervous system for your MLOps pipeline. Deploying it on Kubernetes ensures high availability, scalability, and integration with your ML workloads.
MLflow requires a backend store for metadata and an artifact store for model files. The backend store (PostgreSQL or MySQL) tracks experiments, runs, parameters, and metrics. The artifact store (S3, Azure Blob, or similar) persists model artifacts, plots, and logs. Separating these concerns allows independent scaling and optimization.
Deploy PostgreSQL as the MLflow backend:
yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: mlflow-postgres
namespace: mlops-platform
spec:
replicas: 1
selector:
matchLabels:
app: mlflow-postgres
template:
metadata:
labels:
app: mlflow-postgres
spec:
containers:
- name: postgres
image: postgres:14
ports:
- containerPort: 5432
env:
- name: POSTGRES_DB
value: mlflow
- name: POSTGRES_USER
value: mlflow
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: mlflow-secrets
key: db-password
volumeMounts:
- name: postgres-storage
mountPath: /var/lib/postgresql/data
volumes:
- name: postgres-storage
persistentVolumeClaim:
claimName: mlflow-postgres-pvc
---
apiVersion: v1
kind: Service
metadata:
name: mlflow-postgres
namespace: mlops-platform
spec:
selector:
app: mlflow-postgres
ports:
- port: 5432
targetPort: 5432
The MLflow server deployment connects to both stores:
yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: mlflow-server
namespace: mlops-platform
spec:
replicas: 2 # Multiple replicas for high availability
selector:
matchLabels:
app: mlflow-server
template:
metadata:
labels:
app: mlflow-server
spec:
containers:
- name: mlflow
image: ghcr.io/mlflow/mlflow:v2.9.2
ports:
- containerPort: 5000
command:
- mlflow
- server
- --backend-store-uri
- postgresql://mlflow:$(DB_PASSWORD)@mlflow-postgres:5432/mlflow
- --default-artifact-root
- s3://my-mlflow-artifacts/
- --host
- 0.0.0.0
env:
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: mlflow-secrets
key: db-password
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
name: mlflow-secrets
key: s3-access-key
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
name: mlflow-secrets
key: s3-secret-key
---
apiVersion: v1
kind: Service
metadata:
name: mlflow-server
namespace: mlops-platform
spec:
selector:
app: mlflow-server
ports:
- port: 5000
targetPort: 5000
type: LoadBalancer # Or use Ingress for production
Access MLflow UI through the LoadBalancer or configure an Ingress resource for production environments with proper authentication. The MLflow UI provides visibility into all experiments, model versions, and deployment status—a central dashboard for the entire ML team.
MLOps Pipeline Architecture
Building and Tracking Training Jobs
Training jobs in Kubernetes run as Jobs or CronJobs, providing isolated environments with specified resources. MLflow tracking captures everything about each training run, creating a searchable experiment history.
The training script instruments MLflow tracking. Here’s a complete example training a scikit-learn model:
python
import mlflow
import mlflow.sklearn
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, f1_score
import pandas as pd
import os
# Configure MLflow to use the Kubernetes-hosted server
mlflow.set_tracking_uri("http://mlflow-server.mlops-platform.svc.cluster.local:5000")
mlflow.set_experiment("customer-churn-prediction")
def train_model():
# Load data (from S3, database, etc.)
data = pd.read_csv("s3://my-data-bucket/churn_data.csv")
X = data.drop('churn', axis=1)
y = data['churn']
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Start MLflow run
with mlflow.start_run():
# Log parameters
n_estimators = int(os.getenv('N_ESTIMATORS', 100))
max_depth = int(os.getenv('MAX_DEPTH', 10))
mlflow.log_param("n_estimators", n_estimators)
mlflow.log_param("max_depth", max_depth)
mlflow.log_param("dataset_size", len(data))
# Train model
model = RandomForestClassifier(
n_estimators=n_estimators,
max_depth=max_depth,
random_state=42
)
model.fit(X_train, y_train)
# Evaluate and log metrics
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
mlflow.log_metric("accuracy", accuracy)
mlflow.log_metric("f1_score", f1)
# Log model with signature
signature = mlflow.models.infer_signature(X_train, model.predict(X_train))
mlflow.sklearn.log_model(
model,
"model",
signature=signature,
registered_model_name="churn-predictor"
)
print(f"Model trained - Accuracy: {accuracy:.4f}, F1: {f1:.4f}")
print(f"Run ID: {mlflow.active_run().info.run_id}")
if __name__ == "__main__":
train_model()
Package this training code in a Docker container:
dockerfile
FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY train.py .
CMD ["python", "train.py"]
Deploy the training job as a Kubernetes Job:
yaml
apiVersion: batch/v1
kind: Job
metadata:
name: churn-model-training-v1
namespace: ml-training
spec:
template:
spec:
containers:
- name: training
image: my-registry/churn-training:v1
env:
- name: N_ESTIMATORS
value: "200"
- name: MAX_DEPTH
value: "15"
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
name: ml-secrets
key: aws-access-key
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
name: ml-secrets
key: aws-secret-key
resources:
requests:
memory: "4Gi"
cpu: "2"
limits:
memory: "8Gi"
cpu: "4"
restartPolicy: Never
backoffLimit: 3
This job runs to completion, logs everything to MLflow, and registers the model. You can parameterize training jobs through environment variables, making hyperparameter tuning straightforward. For systematic tuning, deploy multiple jobs with different parameters—each becomes a tracked experiment in MLflow.
GPU-accelerated training requires node selectors or taints:
yaml
spec:
template:
spec:
nodeSelector:
accelerator: nvidia-tesla-v100
containers:
- name: training
image: my-registry/deep-learning-training:v1
resources:
limits:
nvidia.com/gpu: 1
This ensures training pods schedule only on GPU-equipped nodes, maximizing resource utilization and training speed.
Managing Models with MLflow Model Registry
The Model Registry bridges training and deployment, providing version control, staging environments, and approval workflows for models. This governance is critical for production ML systems.
Models automatically register during training when using mlflow.sklearn.log_model() with the registered_model_name parameter. Each training run creates a new model version. The registry tracks:
- Model artifacts (serialized model, dependencies, code)
- Training metrics and parameters
- Model signature (expected input/output schema)
- Version history and lineage
Model stages organize the deployment lifecycle:
- None: Newly registered, not yet evaluated
- Staging: Under evaluation in staging environment
- Production: Currently serving traffic
- Archived: Deprecated, kept for audit trail
Promote models through stages programmatically or via UI:
python
from mlflow.tracking import MlflowClient
client = MlflowClient(tracking_uri="http://mlflow-server.mlops-platform.svc.cluster.local:5000")
# Get the latest model version
model_name = "churn-predictor"
latest_version = client.get_latest_versions(model_name, stages=["None"])[0]
# Transition to Staging
client.transition_model_version_stage(
name=model_name,
version=latest_version.version,
stage="Staging"
)
# After validation in staging, promote to Production
client.transition_model_version_stage(
name=model_name,
version=latest_version.version,
stage="Production",
archive_existing_versions=True # Archive previous production version
)
Model aliases provide stable references that update automatically when new versions are promoted. Instead of hardcoding version numbers, serving code references aliases like “production” or “champion.”
Add descriptions and tags for documentation:
python
client.update_model_version(
name=model_name,
version=latest_version.version,
description="Trained on Q4 2024 data with improved feature engineering"
)
client.set_model_version_tag(
name=model_name,
version=latest_version.version,
key="validation_status",
value="passed"
)
This metadata makes the model registry a living documentation system, not just artifact storage.
Deploying Models for Serving
Models in the Production stage deploy as Kubernetes Deployments, providing scalable, highly-available inference endpoints. MLflow’s model serving capabilities simplify this dramatically.
MLflow provides a built-in model serving container that loads models from the registry:
yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: churn-predictor
namespace: ml-serving
spec:
replicas: 3
selector:
matchLabels:
app: churn-predictor
template:
metadata:
labels:
app: churn-predictor
version: v1
spec:
containers:
- name: model-server
image: ghcr.io/mlflow/mlflow:v2.9.2
command:
- mlflow
- models
- serve
- -m
- models:/churn-predictor/Production
- -h
- 0.0.0.0
- -p
- "8080"
env:
- name: MLFLOW_TRACKING_URI
value: "http://mlflow-server.mlops-platform.svc.cluster.local:5000"
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
name: ml-secrets
key: aws-access-key
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
name: ml-secrets
key: aws-secret-key
ports:
- containerPort: 8080
resources:
requests:
memory: "2Gi"
cpu: "1"
limits:
memory: "4Gi"
cpu: "2"
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
name: churn-predictor
namespace: ml-serving
spec:
selector:
app: churn-predictor
ports:
- port: 80
targetPort: 8080
type: ClusterIP
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: churn-predictor
namespace: ml-serving
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
spec:
rules:
- host: predictions.mycompany.com
http:
paths:
- path: /churn
pathType: Prefix
backend:
service:
name: churn-predictor
port:
number: 80
This deployment references models:/churn-predictor/Production, automatically serving whichever model version is currently in Production stage. When you promote a new model version, you can either wait for pods to restart (picking up the new version) or trigger a rolling restart:
bash
kubectl rollout restart deployment/churn-predictor -n ml-serving
Horizontal Pod Autoscaling adjusts replicas based on load:
yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: churn-predictor-hpa
namespace: ml-serving
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: churn-predictor
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
Call the prediction endpoint with standard HTTP requests:
python
import requests
import json
url = "http://predictions.mycompany.com/churn/invocations"
headers = {"Content-Type": "application/json"}
# Prediction payload
data = {
"dataframe_split": {
"columns": ["age", "tenure", "monthly_charges", "total_charges"],
"data": [[45, 24, 79.99, 1919.76]]
}
}
response = requests.post(url, headers=headers, json=data)
prediction = response.json()
print(f"Churn probability: {prediction['predictions'][0]}")
The MLflow model server handles serialization, input validation against the model signature, and returns predictions in a standard format.
Monitoring and Continuous Training
Production models degrade over time as data distributions shift. Monitoring and automated retraining keep models accurate and relevant.
Log predictions for monitoring and retraining. Capture inputs, outputs, and ground truth when available:
python
from mlflow.tracking import MlflowClient
import mlflow
client = MlflowClient()
def log_prediction(model_version, input_data, prediction, actual=None):
with mlflow.start_run():
mlflow.log_param("model_version", model_version)
mlflow.log_dict(input_data, "input.json")
mlflow.log_metric("prediction", prediction)
if actual is not None:
mlflow.log_metric("actual", actual)
mlflow.log_metric("error", abs(prediction - actual))
Aggregate prediction logs into datasets for model retraining. When prediction errors exceed thresholds, trigger retraining jobs automatically using Kubernetes CronJobs or event-driven systems.
Monitor model performance metrics in real-time:
python
from prometheus_client import Counter, Histogram
import time
prediction_counter = Counter('model_predictions_total', 'Total predictions')
prediction_latency = Histogram('model_prediction_latency_seconds', 'Prediction latency')
@prediction_latency.time()
def predict(model, input_data):
prediction_counter.inc()
return model.predict(input_data)
Export these Prometheus metrics from your serving pods, visualize in Grafana, and alert on anomalies.
Implement A/B testing for model comparison. Deploy multiple model versions simultaneously and route traffic proportionally:
yaml
apiVersion: v1
kind: Service
metadata:
name: churn-predictor-ab
namespace: ml-serving
spec:
selector:
app: churn-predictor
ports:
- port: 80
targetPort: 8080
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: churn-predictor-v1
labels:
app: churn-predictor
version: v1
# Champion model - 90% traffic
spec:
replicas: 9
# ... rest of config
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: churn-predictor-v2
labels:
app: churn-predictor
version: v2
# Challenger model - 10% traffic
spec:
replicas: 1
# ... rest of config
The Service load balances across both versions proportionally to replica counts. Track metrics per version to compare performance before promoting the challenger.
🎯 Production MLOps Checklist
- ✓ All experiments tracked with parameters, metrics, artifacts
- ✓ Models registered with signatures and metadata
- ✓ Staging workflow enforced before production
- ✓ Model validation runs in staging environment
- ✓ Models deployed with health checks and readiness probes
- ✓ Horizontal autoscaling configured for traffic spikes
- ✓ Resource limits prevent resource exhaustion
- ✓ Multiple replicas ensure high availability
- ✓ Prediction latency and throughput monitored
- ✓ Model performance metrics tracked continuously
- ✓ Automated retraining triggered by performance degradation
- ✓ A/B testing framework for safe model updates
CI/CD Integration for Automated MLOps
Manual deployments don’t scale. Integrate your MLOps pipeline with CI/CD systems to automate testing, deployment, and rollback.
GitHub Actions workflow for training pipeline:
yaml
name: Train and Register Model
on:
push:
branches: [main]
paths:
- 'training/**'
workflow_dispatch:
jobs:
train:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Build training image
run: |
docker build -t ${{ secrets.REGISTRY }}/churn-training:${{ github.sha }} training/
docker push ${{ secrets.REGISTRY }}/churn-training:${{ github.sha }}
- name: Deploy training job to Kubernetes
uses: azure/k8s-deploy@v1
with:
manifests: |
k8s/training-job.yaml
images: |
${{ secrets.REGISTRY }}/churn-training:${{ github.sha }}
kubectl-version: 'latest'
- name: Wait for job completion
run: |
kubectl wait --for=condition=complete --timeout=3600s \
job/churn-model-training-${{ github.sha }} -n ml-training
- name: Promote to staging
run: |
python scripts/promote_model.py --stage Staging
Separate workflow for production promotion:
yaml
name: Deploy Model to Production
on:
workflow_dispatch:
inputs:
model_version:
description: 'Model version to promote'
required: true
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Validate model in staging
run: |
python scripts/validate_model.py \
--model-name churn-predictor \
--version ${{ github.event.inputs.model_version }} \
--stage Staging
- name: Promote to production
if: success()
run: |
python scripts/promote_model.py \
--version ${{ github.event.inputs.model_version }} \
--stage Production
- name: Restart serving deployment
run: |
kubectl rollout restart deployment/churn-predictor -n ml-serving
kubectl rollout status deployment/churn-predictor -n ml-serving
This automation ensures consistency: every model deployed to production passes the same validation, every training job runs in identical environments, every deployment follows the same process. Human errors decrease dramatically when manual steps are eliminated.
Conclusion
Building production MLOps systems with Kubernetes and MLflow creates a robust, scalable foundation for machine learning operations. Kubernetes provides the orchestration layer—managing training jobs, serving deployments, and infrastructure resources. MLflow provides the ML-specific layer—experiment tracking, model registry, and standardized serving. Together, they enable data science teams to move from notebooks to production reliably and repeatedly.
The patterns demonstrated here—tracked experiments, versioned models, staged promotions, monitored serving—form the backbone of mature ML systems. Start with this foundation, then customize based on your specific requirements: add feature stores, enhance monitoring, implement advanced deployment strategies. The investment in proper MLOps infrastructure pays dividends in reduced deployment friction, faster iteration cycles, and confident model updates that drive business value.