How to Build an AI Stack with Ollama and Docker Compose

Running Ollama alongside your application services in Docker Compose gives you a portable, reproducible AI stack that starts with a single command. This guide covers composing Ollama with a FastAPI backend, Open WebUI, and Postgres with pgvector — a practical starting point for most AI-powered applications.

Full Stack Compose File

# docker-compose.yml
services:
  ollama:
    image: ollama/ollama:latest
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
    environment:
      - OLLAMA_KEEP_ALIVE=30m
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    restart: unless-stopped

  ollama-init:
    image: ollama/ollama:latest
    depends_on:
      - ollama
    volumes:
      - ollama_data:/root/.ollama
    entrypoint: ["/bin/sh", "-c"]
    command: "sleep 5 && ollama pull llama3.2 && ollama pull nomic-embed-text"
    environment:
      - OLLAMA_HOST=http://ollama:11434

  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    ports:
      - "3000:8080"
    depends_on:
      - ollama
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
    volumes:
      - webui_data:/app/backend/data
    restart: unless-stopped

  api:
    build: ./api
    ports:
      - "8000:8000"
    depends_on:
      - ollama
      - db
    environment:
      - OLLAMA_HOST=http://ollama:11434
      - DATABASE_URL=postgresql://user:pass@db:5432/myapp
    restart: unless-stopped

  db:
    image: pgvector/pgvector:pg16
    environment:
      - POSTGRES_USER=user
      - POSTGRES_PASSWORD=pass
      - POSTGRES_DB=myapp
    volumes:
      - pg_data:/var/lib/postgresql/data
    restart: unless-stopped

volumes:
  ollama_data:
  webui_data:
  pg_data:

CPU-Only Compose (No GPU)

services:
  ollama:
    image: ollama/ollama:latest
    ports:
      - "11434:11434"
    volumes:
      - ollama_data:/root/.ollama
    restart: unless-stopped

FastAPI Service

# api/Dockerfile
FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

# api/main.py
import os, ollama
from fastapi import FastAPI

app = FastAPI()
client = ollama.Client(host=os.getenv('OLLAMA_HOST', 'http://localhost:11434'))

@app.post('/chat')
async def chat(message: str, model: str = 'llama3.2'):
    r = client.chat(model=model, messages=[{'role':'user','content':message}])
    return {'response': r['message']['content']}

Health Checks and Service Dependencies

services:
  ollama:
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:11434/"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 30s

  api:
    depends_on:
      ollama:
        condition: service_healthy

Useful Compose Commands

# Start everything in background
docker compose up -d

# Watch logs
docker compose logs -f ollama

# Pull a new model into the running container
docker compose exec ollama ollama pull gemma3:4b

# List loaded models
docker compose exec ollama ollama ps

# Stop (keep volumes)
docker compose down

# Full reset including model data
docker compose down -v

Why Docker Compose for AI Stacks

Docker Compose solves the same problem for local AI development that it solves for any multi-service application: eliminating the manual startup, configuration, and networking of individual services. Without Compose, running Ollama alongside a web UI, an application API, and a database means four separate terminal windows, manual port configuration, and remembering which services depend on which others. With Compose, docker compose up -d starts everything in the correct order with the correct networking in a single command. This consistency is particularly valuable for teams — every developer gets an identical local AI environment regardless of their operating system or manual configuration history.

The Compose file also serves as living documentation of your AI stack’s architecture. Reading a docker-compose.yml tells you immediately which services exist, how they connect, what environment variables they need, and what data they persist. This documentation value compounds over time as the stack evolves — the Compose file stays in version control alongside your application code, and changes to the AI infrastructure are tracked and reviewable as git diffs.

Networking Between Services

Services in the same Compose file can reach each other via their service name as the hostname — http://ollama:11434 works from any other service in the stack without any additional network configuration. Docker Compose creates a default bridge network for each project and connects all services to it automatically. This means your FastAPI service calls Ollama at http://ollama:11434, Open WebUI calls it at the same address, and your application code never needs to know the actual IP address or port mapping on the host machine. For services that need to be reachable from the host (the Open WebUI browser interface, the FastAPI API), add a ports mapping; for services that only need to be reachable from other containers (like the database), omit the ports section entirely.

Managing Model Storage

The ollama_data volume persists model weights across container restarts and recreations. Without this volume, every time you run docker compose down -v or recreate the Ollama container, all pulled models would need to be re-downloaded. The volume is named (not anonymous) so it persists across docker compose down commands that do not pass -v. For development environments where disk space is constrained, periodically clean unused models with docker compose exec ollama ollama rm model-name. For production environments, consider pre-baking frequently used models into a custom Ollama image rather than pulling them at startup, which avoids the init container delay on first startup.

Profiles for Optional Services

services:
  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    profiles:
      - ui  # Only start when --profile ui is passed

  monitoring:
    image: prom/prometheus
    profiles:
      - monitoring

# Start core stack only
docker compose up -d

# Start with UI
docker compose --profile ui up -d

# Start with monitoring
docker compose --profile monitoring up -d

Compose profiles let you define optional services that are not started by default. This is useful for separating the core AI services (Ollama, your API, the database) from development conveniences (Open WebUI, monitoring dashboards) that you only need sometimes. Teams can define profiles for different use cases — dev, monitoring, ui — and each developer or environment starts only what they need.

Production Considerations

Docker Compose works well for single-host production deployments. For a VPS or dedicated server running an AI application, Compose provides the same reproducibility and service management benefits as in development, with the addition of restart: unless-stopped policies that keep services running after system restarts. For multi-host or high-availability production, Kubernetes (covered in an earlier article in this series) is more appropriate — Compose does not provide pod scheduling, rolling updates, or automatic failover across nodes. The migration path from Compose to Kubernetes is straightforward: the same container images and environment variables transfer directly, and tools like Kompose can convert a docker-compose.yml to Kubernetes manifests as a starting point.

Getting Started

Copy the Compose file from this article, adjust the service names and ports for your use case, and run docker compose up -d. The Ollama init service pulls the models on first run — watch its progress with docker compose logs -f ollama-init. Once the models are pulled, all services are up and reachable at their configured ports. Add your application’s Dockerfile and service definition to the Compose file, configure it to use http://ollama:11434 as the Ollama host, and your entire AI stack is managed as a single deployable unit.

Environment Variables and Secrets

The Compose file in this article hardcodes database credentials for readability, but production deployments should use environment variables or Docker secrets. The simplest approach is a .env file at the project root that Docker Compose automatically reads:

# .env (do not commit this file)
POSTGRES_PASSWORD=your_secure_password
OLLAMA_MODEL=llama3.2
OPEN_WEBUI_SECRET_KEY=your_secret_key

# docker-compose.yml — reference .env variables
services:
  db:
    environment:
      - POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
  open-webui:
    environment:
      - WEBUI_SECRET_KEY=${OPEN_WEBUI_SECRET_KEY}

Add .env to your .gitignore and commit a .env.example with placeholder values so team members know what variables are required. For production deployments on a VPS, set the environment variables directly on the host rather than using a .env file, which avoids having credentials stored in plaintext files on disk.

Extending the Stack

The Compose file in this article is a starting point — extend it for your specific use case. Common additions: a Redis service for caching AI responses (image: redis:7-alpine), a Prometheus + Grafana monitoring stack (using the profiles feature to keep it optional), a Celery worker service (pointing at the same Redis broker) for async AI task processing, and an NGINX reverse proxy to terminate TLS and route traffic to the appropriate service. Each addition follows the same pattern: define the service, configure its environment variables, connect it to the appropriate other services via service-name hostnames, and add a volume if it needs persistent storage. The Compose file grows incrementally as your application’s requirements grow, with each addition clearly visible in version control.

Compose for CI/CD

Docker Compose also simplifies CI/CD pipelines that need to run integration tests against a real Ollama instance. In your CI configuration, start the stack with docker compose up -d, wait for Ollama’s health check to pass, run your tests, then tear down with docker compose down. This gives integration tests a reproducible environment without mocking the AI layer — useful for tests that verify the full round-trip from API endpoint through Ollama to parsed response. The same Compose file used in development runs in CI without modification, eliminating environment-specific test failures caused by configuration differences.

Updating the Stack

Keeping your Compose stack up to date is simple. Pull new images with docker compose pull, then recreate containers with docker compose up -d — Compose only recreates services whose image has changed. For Ollama itself, the image tag ollama/ollama:latest updates with new Ollama releases, but consider pinning to a specific version (e.g. ollama/ollama:0.5.0) for production environments where unexpected behaviour changes from an Ollama update could affect your application. Update the Ollama version deliberately, test with your models, and promote to production only after verifying compatibility. Model weights stored in the ollama_data volume are unaffected by container image updates — your pulled models persist regardless of which Ollama version is running.

Docker Compose vs Kubernetes for AI

The right deployment target depends on your scale and operational requirements. Docker Compose is the right choice for single-host deployments — personal servers, small team deployments, development environments, and production applications that do not need multi-host scaling or Kubernetes’s automatic rescheduling and rolling updates. Kubernetes (covered earlier in this series) adds significant operational complexity for features you may not need. The rule of thumb: start with Compose, operate it in production until you have a concrete reason it is insufficient, then migrate. Most AI application deployments never hit a scale or operational requirement that Compose cannot handle on a well-provisioned single host — and the simplicity of Compose means less time managing infrastructure and more time building features.

The Value of a Reproducible AI Stack

The hidden cost of not using Compose for an AI stack is the time spent on “works on my machine” debugging. When one developer has Ollama running at a different port, another has a different model pulled, and a third has an outdated Open WebUI version, integrating their work creates friction that compounds over time. A shared Compose file eliminates this class of problem — every developer, CI environment, and production host runs the same stack, and the only permitted differences are environment variables (which are explicit and documented). For teams adding AI capabilities to an existing application, the Compose approach is the most practical way to ensure the AI infrastructure is as reproducible and maintainable as the rest of the application stack — treated as code, versioned, reviewed, and deployed consistently across all environments.

Troubleshooting Common Issues

The most common problem with the Ollama Compose stack is the init container finishing before Ollama is fully ready to accept connections. Add a longer sleep or a retry loop to the init command: until ollama pull llama3.2; do sleep 3; done. If Ollama’s GPU is not being used despite having an NVIDIA GPU, verify the NVIDIA Container Toolkit is installed (nvidia-ctk runtime configure --runtime=docker), restart the Docker daemon, and check with docker compose exec ollama nvidia-smi. If Open WebUI cannot connect to Ollama, verify the OLLAMA_BASE_URL environment variable uses the service name (http://ollama:11434) not localhost, since containers use the Docker bridge network rather than the host network. Running docker compose exec open-webui curl http://ollama:11434/ is a quick connectivity diagnostic from inside the Open WebUI container.

The Docker Compose approach to AI infrastructure embodies a broader principle: AI capabilities should be as easy to deploy and maintain as any other application service. Compose makes that true — the Ollama service is just another entry in your docker-compose.yml, configured consistently, version-controlled alongside your application, and deployable with the same commands as the rest of your stack. That operational simplicity is worth more in the long run than any marginal performance or feature benefit from a more complex deployment approach.

Docker Compose is the practical sweet spot for AI infrastructure management — powerful enough to handle real production deployments, simple enough that any developer can understand the full stack by reading a single file. That combination of power and simplicity is what makes it the right tool for the vast majority of Ollama deployments — and the fact that it does so without requiring specialised DevOps knowledge is what makes it accessible to developers at all experience levels — a quality that compounds in value as your team and infrastructure grow together.