End-to-End MLOps Tutorial with Kubernetes and MLflow

Machine learning models only create business value when they’re deployed reliably, monitored continuously, and updated seamlessly. MLOps—the practice of operationalizing machine learning—bridges the gap between data science experiments and production systems. This tutorial walks through building a complete MLOps pipeline using Kubernetes for orchestration and scalability, and MLflow for experiment tracking, model registry, and deployment. You’ll learn not just what to build, but why each component matters and how they work together in production environments.

Architecture Overview: The MLOps Stack

A production MLOps system consists of several interconnected components, each serving a specific purpose in the machine learning lifecycle. Understanding this architecture before diving into implementation prevents building fragmented systems that don’t integrate well.

Kubernetes provides the infrastructure foundation. It orchestrates containers across a cluster, handles scaling, manages secrets, and ensures services remain available despite failures. For MLOps, Kubernetes runs training jobs, serves models, hosts MLflow, and manages the entire ML infrastructure. The declarative nature of Kubernetes configurations means your infrastructure is code-defined, versioned, and reproducible.

MLflow serves three critical functions in the stack. First, MLflow Tracking logs experiments—parameters, metrics, artifacts—creating a searchable history of all model training runs. Second, the Model Registry manages model versions, staging, and promotion to production. Third, MLflow Models provides a standardized format for packaging models with their dependencies, enabling consistent deployment across environments.

The workflow follows a clear path from development to production:

  1. Data scientists experiment locally or in notebooks, tracking runs with MLflow
  2. Promising experiments become training jobs deployed as Kubernetes Jobs
  3. Trained models register in MLflow Model Registry with metadata and artifacts
  4. Models transition through stages: Staging → Production
  5. Production models deploy as Kubernetes Deployments for serving
  6. Monitoring feeds back into the cycle, triggering retraining when needed

This architecture separates concerns—data scientists focus on model development, ML engineers handle deployment infrastructure, and both use consistent tools throughout the lifecycle.

Setting Up the Kubernetes Environment

Before deploying ML workloads, you need a properly configured Kubernetes cluster with appropriate resources and namespaces. This foundation determines how smoothly your MLOps pipeline operates.

Cluster requirements scale with your ML workload. For this tutorial, a cluster with at least 3 nodes (4 CPU, 16GB RAM each) suffices. Production environments often separate training and serving workloads onto different node pools—training nodes with GPUs, serving nodes optimized for inference latency. You can use managed Kubernetes services (EKS, GKE, AKS) or local development with Minikube or Kind for learning.

Create separate namespaces for different concerns:

yaml

apiVersion: v1
kind: Namespace
metadata:
  name: mlops-platform
---
apiVersion: v1
kind: Namespace
metadata:
  name: ml-training
---
apiVersion: v1
kind: Namespace
metadata:
  name: ml-serving

The mlops-platform namespace hosts MLflow and shared services. The ml-training namespace runs training jobs, isolated from serving workloads. The ml-serving namespace contains model serving deployments. This separation provides clear boundaries and enables different resource quotas, security policies, and access controls for each environment.

Storage configuration determines where artifacts persist. MLflow stores model artifacts, training logs, and metadata. Configure a PersistentVolume backed by cloud storage (S3, GCS, Azure Blob) or network storage for production durability:

yaml

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mlflow-pvc
  namespace: mlops-platform
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 100Gi
  storageClassName: standard

Secrets management protects sensitive credentials. Store database passwords, cloud credentials, and API keys as Kubernetes Secrets:

bash

kubectl create secret generic mlflow-secrets \
  --from-literal=db-password='your-secure-password' \
  --from-literal=s3-access-key='your-access-key' \
  --from-literal=s3-secret-key='your-secret-key' \
  -n mlops-platform

These secrets mount into pods as environment variables or files, never hardcoded in container images or configuration files.

Deploying MLflow on Kubernetes

MLflow provides the central nervous system for your MLOps pipeline. Deploying it on Kubernetes ensures high availability, scalability, and integration with your ML workloads.

MLflow requires a backend store for metadata and an artifact store for model files. The backend store (PostgreSQL or MySQL) tracks experiments, runs, parameters, and metrics. The artifact store (S3, Azure Blob, or similar) persists model artifacts, plots, and logs. Separating these concerns allows independent scaling and optimization.

Deploy PostgreSQL as the MLflow backend:

yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: mlflow-postgres
  namespace: mlops-platform
spec:
  replicas: 1
  selector:
    matchLabels:
      app: mlflow-postgres
  template:
    metadata:
      labels:
        app: mlflow-postgres
    spec:
      containers:
      - name: postgres
        image: postgres:14
        ports:
        - containerPort: 5432
        env:
        - name: POSTGRES_DB
          value: mlflow
        - name: POSTGRES_USER
          value: mlflow
        - name: POSTGRES_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mlflow-secrets
              key: db-password
        volumeMounts:
        - name: postgres-storage
          mountPath: /var/lib/postgresql/data
      volumes:
      - name: postgres-storage
        persistentVolumeClaim:
          claimName: mlflow-postgres-pvc
---
apiVersion: v1
kind: Service
metadata:
  name: mlflow-postgres
  namespace: mlops-platform
spec:
  selector:
    app: mlflow-postgres
  ports:
  - port: 5432
    targetPort: 5432

The MLflow server deployment connects to both stores:

yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: mlflow-server
  namespace: mlops-platform
spec:
  replicas: 2  # Multiple replicas for high availability
  selector:
    matchLabels:
      app: mlflow-server
  template:
    metadata:
      labels:
        app: mlflow-server
    spec:
      containers:
      - name: mlflow
        image: ghcr.io/mlflow/mlflow:v2.9.2
        ports:
        - containerPort: 5000
        command:
        - mlflow
        - server
        - --backend-store-uri
        - postgresql://mlflow:$(DB_PASSWORD)@mlflow-postgres:5432/mlflow
        - --default-artifact-root
        - s3://my-mlflow-artifacts/
        - --host
        - 0.0.0.0
        env:
        - name: DB_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mlflow-secrets
              key: db-password
        - name: AWS_ACCESS_KEY_ID
          valueFrom:
            secretKeyRef:
              name: mlflow-secrets
              key: s3-access-key
        - name: AWS_SECRET_ACCESS_KEY
          valueFrom:
            secretKeyRef:
              name: mlflow-secrets
              key: s3-secret-key
---
apiVersion: v1
kind: Service
metadata:
  name: mlflow-server
  namespace: mlops-platform
spec:
  selector:
    app: mlflow-server
  ports:
  - port: 5000
    targetPort: 5000
  type: LoadBalancer  # Or use Ingress for production

Access MLflow UI through the LoadBalancer or configure an Ingress resource for production environments with proper authentication. The MLflow UI provides visibility into all experiments, model versions, and deployment status—a central dashboard for the entire ML team.

MLOps Pipeline Architecture

📊
Experiment
Track parameters & metrics
🏋️
Train
K8s Jobs with resources
📦
Register
Version in Model Registry
🚀
Deploy
Serve with K8s Deployments

Building and Tracking Training Jobs

Training jobs in Kubernetes run as Jobs or CronJobs, providing isolated environments with specified resources. MLflow tracking captures everything about each training run, creating a searchable experiment history.

The training script instruments MLflow tracking. Here’s a complete example training a scikit-learn model:

python

import mlflow
import mlflow.sklearn
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, f1_score
import pandas as pd
import os

# Configure MLflow to use the Kubernetes-hosted server
mlflow.set_tracking_uri("http://mlflow-server.mlops-platform.svc.cluster.local:5000")
mlflow.set_experiment("customer-churn-prediction")

def train_model():
    # Load data (from S3, database, etc.)
    data = pd.read_csv("s3://my-data-bucket/churn_data.csv")
    X = data.drop('churn', axis=1)
    y = data['churn']
    
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.2, random_state=42
    )
    
    # Start MLflow run
    with mlflow.start_run():
        # Log parameters
        n_estimators = int(os.getenv('N_ESTIMATORS', 100))
        max_depth = int(os.getenv('MAX_DEPTH', 10))
        
        mlflow.log_param("n_estimators", n_estimators)
        mlflow.log_param("max_depth", max_depth)
        mlflow.log_param("dataset_size", len(data))
        
        # Train model
        model = RandomForestClassifier(
            n_estimators=n_estimators,
            max_depth=max_depth,
            random_state=42
        )
        model.fit(X_train, y_train)
        
        # Evaluate and log metrics
        y_pred = model.predict(X_test)
        accuracy = accuracy_score(y_test, y_pred)
        f1 = f1_score(y_test, y_pred)
        
        mlflow.log_metric("accuracy", accuracy)
        mlflow.log_metric("f1_score", f1)
        
        # Log model with signature
        signature = mlflow.models.infer_signature(X_train, model.predict(X_train))
        mlflow.sklearn.log_model(
            model,
            "model",
            signature=signature,
            registered_model_name="churn-predictor"
        )
        
        print(f"Model trained - Accuracy: {accuracy:.4f}, F1: {f1:.4f}")
        print(f"Run ID: {mlflow.active_run().info.run_id}")

if __name__ == "__main__":
    train_model()

Package this training code in a Docker container:

dockerfile

FROM python:3.9-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY train.py .

CMD ["python", "train.py"]

Deploy the training job as a Kubernetes Job:

yaml

apiVersion: batch/v1
kind: Job
metadata:
  name: churn-model-training-v1
  namespace: ml-training
spec:
  template:
    spec:
      containers:
      - name: training
        image: my-registry/churn-training:v1
        env:
        - name: N_ESTIMATORS
          value: "200"
        - name: MAX_DEPTH
          value: "15"
        - name: AWS_ACCESS_KEY_ID
          valueFrom:
            secretKeyRef:
              name: ml-secrets
              key: aws-access-key
        - name: AWS_SECRET_ACCESS_KEY
          valueFrom:
            secretKeyRef:
              name: ml-secrets
              key: aws-secret-key
        resources:
          requests:
            memory: "4Gi"
            cpu: "2"
          limits:
            memory: "8Gi"
            cpu: "4"
      restartPolicy: Never
  backoffLimit: 3

This job runs to completion, logs everything to MLflow, and registers the model. You can parameterize training jobs through environment variables, making hyperparameter tuning straightforward. For systematic tuning, deploy multiple jobs with different parameters—each becomes a tracked experiment in MLflow.

GPU-accelerated training requires node selectors or taints:

yaml

spec:
  template:
    spec:
      nodeSelector:
        accelerator: nvidia-tesla-v100
      containers:
      - name: training
        image: my-registry/deep-learning-training:v1
        resources:
          limits:
            nvidia.com/gpu: 1

This ensures training pods schedule only on GPU-equipped nodes, maximizing resource utilization and training speed.

Managing Models with MLflow Model Registry

The Model Registry bridges training and deployment, providing version control, staging environments, and approval workflows for models. This governance is critical for production ML systems.

Models automatically register during training when using mlflow.sklearn.log_model() with the registered_model_name parameter. Each training run creates a new model version. The registry tracks:

  • Model artifacts (serialized model, dependencies, code)
  • Training metrics and parameters
  • Model signature (expected input/output schema)
  • Version history and lineage

Model stages organize the deployment lifecycle:

  • None: Newly registered, not yet evaluated
  • Staging: Under evaluation in staging environment
  • Production: Currently serving traffic
  • Archived: Deprecated, kept for audit trail

Promote models through stages programmatically or via UI:

python

from mlflow.tracking import MlflowClient

client = MlflowClient(tracking_uri="http://mlflow-server.mlops-platform.svc.cluster.local:5000")

# Get the latest model version
model_name = "churn-predictor"
latest_version = client.get_latest_versions(model_name, stages=["None"])[0]

# Transition to Staging
client.transition_model_version_stage(
    name=model_name,
    version=latest_version.version,
    stage="Staging"
)

# After validation in staging, promote to Production
client.transition_model_version_stage(
    name=model_name,
    version=latest_version.version,
    stage="Production",
    archive_existing_versions=True  # Archive previous production version
)

Model aliases provide stable references that update automatically when new versions are promoted. Instead of hardcoding version numbers, serving code references aliases like “production” or “champion.”

Add descriptions and tags for documentation:

python

client.update_model_version(
    name=model_name,
    version=latest_version.version,
    description="Trained on Q4 2024 data with improved feature engineering"
)

client.set_model_version_tag(
    name=model_name,
    version=latest_version.version,
    key="validation_status",
    value="passed"
)

This metadata makes the model registry a living documentation system, not just artifact storage.

Deploying Models for Serving

Models in the Production stage deploy as Kubernetes Deployments, providing scalable, highly-available inference endpoints. MLflow’s model serving capabilities simplify this dramatically.

MLflow provides a built-in model serving container that loads models from the registry:

yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: churn-predictor
  namespace: ml-serving
spec:
  replicas: 3
  selector:
    matchLabels:
      app: churn-predictor
  template:
    metadata:
      labels:
        app: churn-predictor
        version: v1
    spec:
      containers:
      - name: model-server
        image: ghcr.io/mlflow/mlflow:v2.9.2
        command:
        - mlflow
        - models
        - serve
        - -m
        - models:/churn-predictor/Production
        - -h
        - 0.0.0.0
        - -p
        - "8080"
        env:
        - name: MLFLOW_TRACKING_URI
          value: "http://mlflow-server.mlops-platform.svc.cluster.local:5000"
        - name: AWS_ACCESS_KEY_ID
          valueFrom:
            secretKeyRef:
              name: ml-secrets
              key: aws-access-key
        - name: AWS_SECRET_ACCESS_KEY
          valueFrom:
            secretKeyRef:
              name: ml-secrets
              key: aws-secret-key
        ports:
        - containerPort: 8080
        resources:
          requests:
            memory: "2Gi"
            cpu: "1"
          limits:
            memory: "4Gi"
            cpu: "2"
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
  name: churn-predictor
  namespace: ml-serving
spec:
  selector:
    app: churn-predictor
  ports:
  - port: 80
    targetPort: 8080
  type: ClusterIP
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: churn-predictor
  namespace: ml-serving
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  rules:
  - host: predictions.mycompany.com
    http:
      paths:
      - path: /churn
        pathType: Prefix
        backend:
          service:
            name: churn-predictor
            port:
              number: 80

This deployment references models:/churn-predictor/Production, automatically serving whichever model version is currently in Production stage. When you promote a new model version, you can either wait for pods to restart (picking up the new version) or trigger a rolling restart:

bash

kubectl rollout restart deployment/churn-predictor -n ml-serving

Horizontal Pod Autoscaling adjusts replicas based on load:

yaml

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: churn-predictor-hpa
  namespace: ml-serving
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: churn-predictor
  minReplicas: 3
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

Call the prediction endpoint with standard HTTP requests:

python

import requests
import json

url = "http://predictions.mycompany.com/churn/invocations"
headers = {"Content-Type": "application/json"}

# Prediction payload
data = {
    "dataframe_split": {
        "columns": ["age", "tenure", "monthly_charges", "total_charges"],
        "data": [[45, 24, 79.99, 1919.76]]
    }
}

response = requests.post(url, headers=headers, json=data)
prediction = response.json()
print(f"Churn probability: {prediction['predictions'][0]}")

The MLflow model server handles serialization, input validation against the model signature, and returns predictions in a standard format.

Monitoring and Continuous Training

Production models degrade over time as data distributions shift. Monitoring and automated retraining keep models accurate and relevant.

Log predictions for monitoring and retraining. Capture inputs, outputs, and ground truth when available:

python

from mlflow.tracking import MlflowClient
import mlflow

client = MlflowClient()

def log_prediction(model_version, input_data, prediction, actual=None):
    with mlflow.start_run():
        mlflow.log_param("model_version", model_version)
        mlflow.log_dict(input_data, "input.json")
        mlflow.log_metric("prediction", prediction)
        if actual is not None:
            mlflow.log_metric("actual", actual)
            mlflow.log_metric("error", abs(prediction - actual))

Aggregate prediction logs into datasets for model retraining. When prediction errors exceed thresholds, trigger retraining jobs automatically using Kubernetes CronJobs or event-driven systems.

Monitor model performance metrics in real-time:

python

from prometheus_client import Counter, Histogram
import time

prediction_counter = Counter('model_predictions_total', 'Total predictions')
prediction_latency = Histogram('model_prediction_latency_seconds', 'Prediction latency')

@prediction_latency.time()
def predict(model, input_data):
    prediction_counter.inc()
    return model.predict(input_data)

Export these Prometheus metrics from your serving pods, visualize in Grafana, and alert on anomalies.

Implement A/B testing for model comparison. Deploy multiple model versions simultaneously and route traffic proportionally:

yaml

apiVersion: v1
kind: Service
metadata:
  name: churn-predictor-ab
  namespace: ml-serving
spec:
  selector:
    app: churn-predictor
  ports:
  - port: 80
    targetPort: 8080
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: churn-predictor-v1
  labels:
    app: churn-predictor
    version: v1
# Champion model - 90% traffic
spec:
  replicas: 9
  # ... rest of config
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: churn-predictor-v2
  labels:
    app: churn-predictor
    version: v2
# Challenger model - 10% traffic
spec:
  replicas: 1
  # ... rest of config

The Service load balances across both versions proportionally to replica counts. Track metrics per version to compare performance before promoting the challenger.

🎯 Production MLOps Checklist

Training & Registry:
  • ✓ All experiments tracked with parameters, metrics, artifacts
  • ✓ Models registered with signatures and metadata
  • ✓ Staging workflow enforced before production
  • ✓ Model validation runs in staging environment
Deployment & Serving:
  • ✓ Models deployed with health checks and readiness probes
  • ✓ Horizontal autoscaling configured for traffic spikes
  • ✓ Resource limits prevent resource exhaustion
  • ✓ Multiple replicas ensure high availability
Monitoring & Maintenance:
  • ✓ Prediction latency and throughput monitored
  • ✓ Model performance metrics tracked continuously
  • ✓ Automated retraining triggered by performance degradation
  • ✓ A/B testing framework for safe model updates

CI/CD Integration for Automated MLOps

Manual deployments don’t scale. Integrate your MLOps pipeline with CI/CD systems to automate testing, deployment, and rollback.

GitHub Actions workflow for training pipeline:

yaml

name: Train and Register Model

on:
  push:
    branches: [main]
    paths:
      - 'training/**'
  workflow_dispatch:

jobs:
  train:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v3
    
    - name: Build training image
      run: |
        docker build -t ${{ secrets.REGISTRY }}/churn-training:${{ github.sha }} training/
        docker push ${{ secrets.REGISTRY }}/churn-training:${{ github.sha }}
    
    - name: Deploy training job to Kubernetes
      uses: azure/k8s-deploy@v1
      with:
        manifests: |
          k8s/training-job.yaml
        images: |
          ${{ secrets.REGISTRY }}/churn-training:${{ github.sha }}
        kubectl-version: 'latest'
    
    - name: Wait for job completion
      run: |
        kubectl wait --for=condition=complete --timeout=3600s \
          job/churn-model-training-${{ github.sha }} -n ml-training
    
    - name: Promote to staging
      run: |
        python scripts/promote_model.py --stage Staging

Separate workflow for production promotion:

yaml

name: Deploy Model to Production

on:
  workflow_dispatch:
    inputs:
      model_version:
        description: 'Model version to promote'
        required: true

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v3
    
    - name: Validate model in staging
      run: |
        python scripts/validate_model.py \
          --model-name churn-predictor \
          --version ${{ github.event.inputs.model_version }} \
          --stage Staging
    
    - name: Promote to production
      if: success()
      run: |
        python scripts/promote_model.py \
          --version ${{ github.event.inputs.model_version }} \
          --stage Production
    
    - name: Restart serving deployment
      run: |
        kubectl rollout restart deployment/churn-predictor -n ml-serving
        kubectl rollout status deployment/churn-predictor -n ml-serving

This automation ensures consistency: every model deployed to production passes the same validation, every training job runs in identical environments, every deployment follows the same process. Human errors decrease dramatically when manual steps are eliminated.

Conclusion

Building production MLOps systems with Kubernetes and MLflow creates a robust, scalable foundation for machine learning operations. Kubernetes provides the orchestration layer—managing training jobs, serving deployments, and infrastructure resources. MLflow provides the ML-specific layer—experiment tracking, model registry, and standardized serving. Together, they enable data science teams to move from notebooks to production reliably and repeatedly.

The patterns demonstrated here—tracked experiments, versioned models, staged promotions, monitored serving—form the backbone of mature ML systems. Start with this foundation, then customize based on your specific requirements: add feature stores, enhance monitoring, implement advanced deployment strategies. The investment in proper MLOps infrastructure pays dividends in reduced deployment friction, faster iteration cycles, and confident model updates that drive business value.

Leave a Comment