How to Run Ollama with Docker and Docker Compose

Running Ollama as a Docker container makes it easy to deploy on a Linux server, integrate it into a multi-service stack, and manage it consistently alongside other services. This guide covers how to run Ollama with Docker and Docker Compose, including GPU passthrough, model persistence, and a complete example stack with Open WebUI.

Running Ollama with Docker

The official Ollama Docker image is available on Docker Hub. For CPU-only:

docker run -d \
  -v ollama:/root/.ollama \
  -p 11434:11434 \
  --name ollama \
  ollama/ollama

The -v ollama:/root/.ollama flag mounts a named volume for model storage — without this, pulled models are lost when the container is removed. With this volume, models persist across container restarts and updates.

GPU Passthrough: NVIDIA

To use NVIDIA GPU acceleration, install the NVIDIA Container Toolkit first, then pass the GPU to the container:

# Install NVIDIA Container Toolkit (Ubuntu/Debian)
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
  sed "s#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g" | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
# Run Ollama with GPU access
docker run -d \
  --gpus all \
  -v ollama:/root/.ollama \
  -p 11434:11434 \
  --name ollama \
  ollama/ollama

Verify GPU is being used by checking the Ollama logs: docker logs ollama. You should see a line indicating CUDA is available and the GPU model detected.

GPU Passthrough: AMD (ROCm)

docker run -d \
  --device /dev/kfd \
  --device /dev/dri \
  -v ollama:/root/.ollama \
  -p 11434:11434 \
  --name ollama \
  ollama/ollama:rocm

Use the :rocm image tag for AMD GPU support. ROCm support in Docker works well on Linux but is not available on Windows or macOS.

Pulling Models into a Docker Container

# Pull a model into the running container
docker exec ollama ollama pull llama3.2

# Run a prompt directly
docker exec -it ollama ollama run llama3.2 "Explain Docker in one sentence"

# List pulled models
docker exec ollama ollama list

Docker Compose: Ollama + Open WebUI

The most practical Docker Compose setup combines Ollama for inference with Open WebUI for a chat interface. Save this as docker-compose.yml:

version: '3.8'

services:
  ollama:
    image: ollama/ollama
    container_name: ollama
    restart: unless-stopped
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    restart: unless-stopped
    ports:
      - "3000:8080"
    volumes:
      - open_webui_data:/app/backend/data
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
    depends_on:
      - ollama

volumes:
  ollama_data:
  open_webui_data:
# Start the stack
docker compose up -d

# Pull a model (do this after the stack is up)
docker compose exec ollama ollama pull llama3.2

# View logs
docker compose logs -f

# Stop the stack
docker compose down

Access Open WebUI at http://localhost:3000. The key configuration is the OLLAMA_BASE_URL environment variable — within Docker Compose’s network, services communicate by container name rather than localhost, so it is http://ollama:11434 (the container name) rather than http://localhost:11434.

CPU-Only Compose (No GPU)

If you are running on a server without a GPU, remove the deploy section from the Ollama service:

services:
  ollama:
    image: ollama/ollama
    container_name: ollama
    restart: unless-stopped
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
    # No deploy/resources section for CPU-only

Persisting Models Across Redeployments

The named volume ollama_data persists model files across container restarts and even when you run docker compose down. Models are only lost if you explicitly remove the volume with docker compose down -v or docker volume rm ollama_data. To back up your models, you can copy the volume contents: docker run --rm -v ollama_data:/data -v $(pwd):/backup alpine tar czf /backup/ollama-backup.tar.gz /data.

Environment Variables for Ollama in Docker

Several useful environment variables can be set in the Docker Compose service definition:

  ollama:
    environment:
      - OLLAMA_HOST=0.0.0.0          # Listen on all interfaces (default in Docker)
      - OLLAMA_NUM_PARALLEL=2        # Allow 2 concurrent requests
      - OLLAMA_MAX_LOADED_MODELS=2   # Keep 2 models in memory simultaneously
      - OLLAMA_KEEP_ALIVE=10m        # Keep model in memory 10 minutes after last use

OLLAMA_NUM_PARALLEL allows serving multiple concurrent requests — set this if you have multiple users hitting the same Ollama instance via Open WebUI. OLLAMA_KEEP_ALIVE controls how long a model stays loaded in memory after the last request — the default is 5 minutes; increase it for frequently-used models to avoid the reload latency on every request.

When to Use Docker vs Native Ollama

Docker makes most sense for server deployments where you want process isolation, easy updates via image pulls, and a consistent environment regardless of the host OS. For a personal machine where you are the only user, native Ollama installation is simpler — no Docker overhead, easier GPU access, and models are immediately usable from the command line. For a home server, NAS, or shared team server, Docker Compose with Ollama and Open WebUI gives you a clean, reproducible stack that can be version-controlled and redeployed in minutes on any Linux machine.

Understanding Docker Networking for Ollama

One of the most common points of confusion when running Ollama in Docker is the networking between containers. When you run Ollama and Open WebUI as separate containers in the same Docker Compose stack, they communicate on Docker’s internal network using container names as hostnames — not localhost. This is why the Open WebUI environment variable is set to http://ollama:11434 rather than http://localhost:11434: within Docker’s network, “ollama” resolves to the Ollama container’s IP address automatically.

If you run Ollama natively on the host (not in Docker) but want to connect a Docker container to it, use http://host.docker.internal:11434 — this is a special hostname that Docker resolves to the host machine’s IP address from inside a container. This is the setup used in the Open WebUI standalone Docker run command, where Open WebUI is containerised but Ollama runs natively. The --add-host=host.docker.internal:host-gateway flag in the run command enables this resolution on Linux, where it is not available by default (unlike macOS and Windows where Docker Desktop handles it automatically).

For more complex setups — multiple applications needing to reach Ollama, Ollama behind a reverse proxy, or Ollama on a different machine — it helps to understand which networking pattern you are using: container-to-container (use container names), container-to-host (use host.docker.internal), or host-to-container (use localhost with the published port). Mixing these up is the root cause of most Ollama connection errors in Docker environments.

Adding a Reverse Proxy with Nginx

For server deployments where you want to expose Ollama or Open WebUI on a domain name with HTTPS, adding Nginx as a reverse proxy to the Compose stack is straightforward. Add this to your docker-compose.yml:

  nginx:
    image: nginx:alpine
    container_name: nginx
    restart: unless-stopped
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./nginx.conf:/etc/nginx/conf.d/default.conf:ro
      - ./certs:/etc/nginx/certs:ro  # SSL certificates
    depends_on:
      - open-webui

And a minimal nginx.conf to proxy Open WebUI:

server {
    listen 80;
    server_name your-domain.com;
    return 301 https://$host$request_uri;
}

server {
    listen 443 ssl;
    server_name your-domain.com;

    ssl_certificate     /etc/nginx/certs/fullchain.pem;
    ssl_certificate_key /etc/nginx/certs/privkey.pem;

    location / {
        proxy_pass http://open-webui:8080;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";  # needed for WebSocket
        proxy_read_timeout 300s;  # long timeout for LLM streaming responses
    }
}

The proxy_read_timeout 300s is important — LLM streaming responses can take a while for long outputs, and the default Nginx timeout of 60 seconds will cut connections for slow or lengthy generations. Set it high enough to accommodate your longest expected responses.

Health Checks and Auto-Recovery

Adding health checks to the Compose services ensures Docker restarts containers that become unresponsive, which is useful for long-running server deployments:

  ollama:
    image: ollama/ollama
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:11434/api/tags"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s

The health check pings Ollama’s tags endpoint every 30 seconds. If it fails three times in a row, Docker marks the container unhealthy. Combined with restart: unless-stopped, Docker will attempt to restart unhealthy containers automatically, which handles transient issues like a model loading failure or temporary memory exhaustion without requiring manual intervention.

Updating the Stack

Updating Ollama and Open WebUI in a Compose stack is clean and non-destructive because model data and conversation history live in volumes separate from the container images:

# Pull the latest images
docker compose pull

# Recreate containers with the new images (models and data preserved)
docker compose up -d

# Verify everything is running
docker compose ps
docker compose logs --tail=20

The named volumes (ollama_data and open_webui_data) are untouched by image updates — all your models, conversation history, and user accounts persist automatically. This is one of the key advantages of the Docker Compose setup over running native installers, where update paths can be less predictable and data migration is sometimes required between versions.

Resource Limits

For shared servers where Ollama should not consume all available resources, you can set memory and CPU limits in the Compose service definition:

  ollama:
    image: ollama/ollama
    deploy:
      resources:
        limits:
          memory: 16G    # maximum RAM usage
        reservations:
          memory: 8G     # guaranteed minimum RAM
          devices:
            - driver: nvidia
              count: 1   # use exactly 1 GPU
              capabilities: [gpu]

Setting memory limits prevents a single large model load from consuming all available RAM and starving other services on the same host. If Ollama hits the memory limit while loading a model, the load will fail with an out-of-memory error rather than degrading the whole machine — which is the correct failure mode for a production server where other services need to remain stable.

Pre-Pulling Models at Stack Startup

A useful pattern for production deployments is pre-pulling required models automatically when the stack starts, rather than pulling them manually after deployment. Add a one-shot init container to your Compose file that runs ollama pull after Ollama is ready:

  ollama-init:
    image: ollama/ollama
    depends_on:
      ollama:
        condition: service_healthy
    entrypoint: ["/bin/sh", "-c"]
    command: >
      "ollama pull llama3.2 &&
       ollama pull nomic-embed-text &&
       echo 'Models ready'"
    environment:
      - OLLAMA_HOST=http://ollama:11434
    restart: "no"  # run once and exit

This init container waits for Ollama’s health check to pass (ensuring the service is ready), pulls the specified models, and exits. Because it has restart: "no", it only runs once per docker compose up invocation. This is particularly useful for CI/CD pipelines or infrastructure-as-code setups where you want a fresh deployment to be fully functional with no manual steps after docker compose up -d.

Security Considerations

A few security points matter for any server deployment of Ollama. By default Ollama’s API has no authentication — anyone who can reach port 11434 can run inference on your hardware. In a Docker Compose stack, Ollama’s port (11434) should generally not be published to the host at all if external access is not needed; only Open WebUI’s port needs to be externally accessible. Remove the ports section from the Ollama service definition if it only needs to be reached by other containers in the same Compose network:

  ollama:
    image: ollama/ollama
    # No ports: section — only reachable within the Compose network
    volumes:
      - ollama_data:/root/.ollama

Open WebUI handles its own user authentication, so even if Ollama is not directly exposed, access to the models is controlled through Open WebUI’s login system. For deployments where Ollama must be directly accessible from outside the Docker network (for example, to serve multiple applications), place it behind a reverse proxy with at minimum basic HTTP authentication to prevent unauthorised use. Never expose a raw, unauthenticated Ollama port to the public internet — it allows anyone to run unlimited inference on your machine at your expense.

Putting It All Together

The Docker Compose setup described in this guide gives you a reproducible, maintainable local LLM server that can be deployed on any Linux machine with Docker installed. The complete stack — Ollama for inference, Open WebUI for the chat interface, named volumes for data persistence, health checks for reliability, and optionally Nginx for HTTPS and resource limits for stability — can be version-controlled as a single docker-compose.yml file and redeployed in minutes. For personal use on a home server this is a clean way to run a private ChatGPT-equivalent with no ongoing costs. For small teams it provides a shared LLM service with multi-user support that IT can manage consistently without specialised AI infrastructure knowledge.

Debugging Common Docker Issues

Three issues come up repeatedly with Ollama in Docker. First, CUDA not detected: run docker exec ollama nvidia-smi — if this fails, the NVIDIA Container Toolkit is not correctly configured on the host. Re-run sudo nvidia-ctk runtime configure --runtime=docker and restart Docker. Second, Open WebUI cannot reach Ollama: run docker compose exec open-webui curl http://ollama:11434/api/tags — if this fails, the containers are not on the same Compose network, which usually means they are defined in different Compose files or projects. Use docker compose ps to confirm both containers are up and check they share the same network with docker network inspect. Third, model loads succeed but inference is extremely slow: run docker exec ollama ollama ps while a request is in flight — if the model shows 0 GPU layers, it is running on CPU despite GPU being available, usually because the VRAM is fragmented or another model has claimed the GPU memory. Restart the Ollama container to clear the GPU memory and try again. These three checks cover the vast majority of Docker-specific Ollama issues — most other problems are the same model loading and VRAM issues that occur with native Ollama, just with an extra layer of Docker networking and container lifecycle to consider when diagnosing them. Keeping docker compose logs -f ollama open in a terminal while reproducing an issue is the fastest way to see exactly what Ollama reports at the moment a problem occurs, which almost always points directly to the root cause. The combination of structured logs, health checks, and named volumes makes a Docker-based Ollama deployment significantly easier to maintain over time than managing the same setup with systemd services and manual process monitoring on a bare Linux host.

Leave a Comment