How to Deploy Ollama on Kubernetes
A production guide to running Ollama on Kubernetes: a complete Deployment manifest with GPU resource limits, PersistentVolumeClaim for model storage, and ClusterIP Service, pulling models via kubectl exec, an init container pattern that pre-pulls models before the main container starts, exposing via NGINX Ingress with long proxy timeouts, a CPU-only configuration for clusters without GPUs, and deploying via the otwld/ollama-helm community Helm chart.