How to Deploy Ollama with Ansible

Ansible is the most widely used tool for automating server configuration, and it is a natural fit for deploying Ollama across multiple machines. Whether you are setting up a single GPU workstation, a fleet of developer machines, or a homelab cluster, an Ansible playbook lets you install Ollama, configure it as a service, pull models, and enforce consistent settings — all without logging into each machine individually. This guide walks through a complete, production-ready Ansible setup for Ollama on Ubuntu and Debian hosts.

The playbook patterns here are idempotent — running them multiple times produces the same result, and they are safe to run on machines where Ollama is already installed. They also handle the most common real-world requirements: running Ollama as a systemd service, binding to a specific network interface, setting GPU resource limits, and automating model pulls on first deploy.

Prerequisites

You need Ansible installed on your control machine — the machine you will run the playbook from. Install it with pip install ansible or via your system package manager. The target hosts need SSH access and a user with sudo privileges. Ollama officially supports Ubuntu 20.04 and later, Debian 11 and later, and any other Linux distribution with glibc 2.17 or newer. The playbook uses only core Ansible modules — no collections from Ansible Galaxy are required.

Inventory Setup

Define your target hosts in an inventory file. A simple INI-format inventory works for most setups:

[ollama_servers]
gpu-workstation ansible_host=192.168.1.50 ansible_user=ubuntu
dev-server ansible_host=192.168.1.51 ansible_user=ubuntu

[ollama_servers:vars]
ansible_python_interpreter=/usr/bin/python3
ollama_model=llama3.2
ollama_host=0.0.0.0
ollama_port=11434

The group variables ollama_model, ollama_host, and ollama_port are referenced throughout the playbook. Setting ollama_host=0.0.0.0 makes Ollama listen on all interfaces, which is necessary if other machines on your network need to reach it. If you only need localhost access, leave it as the default by omitting the variable.

Installing Ollama

Ollama’s official install script handles download, binary placement, and initial service setup. In Ansible, you run it with the shell module but guard it with a check so it only runs if Ollama is not already installed:

---
- name: Deploy Ollama
  hosts: ollama_servers
  become: true

  tasks:
    - name: Check if Ollama is already installed
      stat:
        path: /usr/local/bin/ollama
      register: ollama_binary

    - name: Install Ollama
      shell: curl -fsSL https://ollama.com/install.sh | sh
      when: not ollama_binary.stat.exists
      args:
        executable: /bin/bash

The stat module checks whether the Ollama binary exists at its default install location and stores the result in ollama_binary. The when condition on the install task means the shell command only runs on hosts where Ollama is not yet installed. On subsequent playbook runs, the install task is skipped entirely, making the playbook safe to run repeatedly without re-downloading the installer.

Configuring the systemd Service

Ollama’s installer creates a systemd service, but the default configuration binds to localhost only and uses no custom environment variables. Use Ansible to deploy a systemd override file that sets OLLAMA_HOST, OLLAMA_PORT, and any other environment variables your setup needs:

    - name: Create systemd override directory
      file:
        path: /etc/systemd/system/ollama.service.d
        state: directory
        mode: '0755'

    - name: Deploy systemd override
      copy:
        dest: /etc/systemd/system/ollama.service.d/override.conf
        content: |
          [Service]
          Environment="OLLAMA_HOST={{ ollama_host }}:{{ ollama_port }}"
          Environment="OLLAMA_MODELS=/var/lib/ollama/models"
          Environment="OLLAMA_NUM_PARALLEL=1"
        mode: '0644'
      notify: Restart Ollama

    - name: Reload systemd and enable Ollama
      systemd:
        name: ollama
        enabled: true
        daemon_reload: true
        state: started

The notify: Restart Ollama directive triggers a handler that restarts the service whenever the override file changes. Handlers run at the end of the play, after all tasks have completed, and only if they were notified. This means Ollama is restarted at most once per playbook run, even if multiple tasks notify the same handler.

The OLLAMA_NUM_PARALLEL variable controls how many requests Ollama processes simultaneously. The default is 1, which is the right setting for most single-GPU setups. Increasing it on machines with enough VRAM can improve throughput when multiple users share the same server, but requires careful testing — running two large models in parallel can exhaust GPU memory and cause out-of-memory crashes.

Defining the Handler

Add the handler at the bottom of the playbook file, at the same indentation level as tasks:

  handlers:
    - name: Restart Ollama
      systemd:
        name: ollama
        state: restarted
        daemon_reload: true

Using a handler rather than a dedicated restart task ensures Ollama is only restarted when the configuration actually changes. If you run the playbook on a host where nothing has changed, the handler is never notified and the service continues running without interruption. This is the correct Ansible pattern for any task that requires a service restart — it avoids unnecessary downtime on idempotent runs.

Pulling Models Automatically

After Ollama is running, use the command module to pull models. Guard the pull with a check so it only runs when the model is not already present:

    - name: Check if model is already pulled
      command: ollama list
      register: ollama_list
      changed_when: false
      become_user: ollama

    - name: Pull Ollama model
      command: ollama pull {{ ollama_model }}
      become_user: ollama
      when: ollama_model not in ollama_list.stdout
      environment:
        OLLAMA_HOST: "http://{{ ollama_host }}:{{ ollama_port }}"

The changed_when: false on the list command tells Ansible that running ollama list never changes the system state, so it will not count as a change in the playbook output. The become_user: ollama directive runs the command as the ollama system user that the installer creates, which has the correct permissions to access the model storage directory.

Model pulls can take several minutes for large models. Ansible’s default command timeout is 30 seconds, which is not enough. Add async: 600 and poll: 10 to the pull task to run it asynchronously and check progress every 10 seconds, with a maximum wait of 10 minutes.

Firewall Configuration

If your hosts run UFW, open port 11434 for the hosts or networks that need access to Ollama:

    - name: Allow Ollama port through UFW
      ufw:
        rule: allow
        port: "{{ ollama_port }}"
        proto: tcp
        src: 192.168.1.0/24
      when: ollama_host != "127.0.0.1"

The when condition skips the firewall rule if Ollama is bound to localhost only — there is no point opening a port for traffic that will never reach it from the network. Scoping the rule to your local subnet (192.168.1.0/24) rather than any source is a sensible default for homelab and internal deployments, keeping Ollama off the public internet while making it available to other machines on your LAN.

The Complete Playbook

Here is the full playbook combining everything above:

---
- name: Deploy and configure Ollama
  hosts: ollama_servers
  become: true

  tasks:
    - name: Check if Ollama is installed
      stat:
        path: /usr/local/bin/ollama
      register: ollama_binary

    - name: Install Ollama
      shell: curl -fsSL https://ollama.com/install.sh | sh
      when: not ollama_binary.stat.exists
      args:
        executable: /bin/bash

    - name: Create systemd override directory
      file:
        path: /etc/systemd/system/ollama.service.d
        state: directory
        mode: '0755'

    - name: Deploy systemd override
      copy:
        dest: /etc/systemd/system/ollama.service.d/override.conf
        content: |
          [Service]
          Environment="OLLAMA_HOST={{ ollama_host }}:{{ ollama_port }}"
          Environment="OLLAMA_MODELS=/var/lib/ollama/models"
          Environment="OLLAMA_NUM_PARALLEL=1"
        mode: '0644'
      notify: Restart Ollama

    - name: Enable and start Ollama
      systemd:
        name: ollama
        enabled: true
        daemon_reload: true
        state: started

    - name: Allow Ollama port through UFW
      ufw:
        rule: allow
        port: "{{ ollama_port }}"
        proto: tcp
        src: 192.168.1.0/24
      when: ollama_host != "127.0.0.1"

    - name: Wait for Ollama to be ready
      uri:
        url: "http://{{ ollama_host }}:{{ ollama_port }}/api/tags"
        status_code: 200
      register: result
      until: result.status == 200
      retries: 12
      delay: 5

    - name: Check installed models
      command: ollama list
      register: ollama_list
      changed_when: false
      become_user: ollama

    - name: Pull default model
      command: ollama pull {{ ollama_model }}
      become_user: ollama
      when: ollama_model not in ollama_list.stdout
      async: 600
      poll: 10
      environment:
        OLLAMA_HOST: "http://{{ ollama_host }}:{{ ollama_port }}"

  handlers:
    - name: Restart Ollama
      systemd:
        name: ollama
        state: restarted
        daemon_reload: true

The uri module wait task polls the Ollama API health endpoint until it returns a 200 status, retrying up to 12 times with 5-second delays. This ensures the model pull only starts after Ollama is fully up and accepting requests — without this wait, the pull command can fail if Ollama takes a few seconds to start after a restart.

Managing Multiple Models

To pull multiple models on deploy, change ollama_model to a list variable and loop over it:

# In inventory or group_vars:
ollama_models:
  - llama3.2
  - nomic-embed-text
  - qwen2.5-coder:7b

# In the playbook task:
- name: Pull Ollama models
  command: ollama pull {{ item }}
  become_user: ollama
  loop: "{{ ollama_models }}"
  when: item not in ollama_list.stdout
  async: 600
  poll: 10
  environment:
    OLLAMA_HOST: "http://{{ ollama_host }}:{{ ollama_port }}"

Using a loop with async on each iteration means the pulls happen sequentially rather than in parallel. This is intentional — pulling multiple large models simultaneously would saturate network bandwidth and potentially overwhelm disk I/O. Sequential pulls are slower overall but much more predictable and less likely to fail partway through on constrained connections.

Keeping Ollama Updated

Ollama updates are distributed the same way as the initial install — by running the install script again. Add a separate update playbook or a tagged task that conditionally re-runs the script when a version check indicates a newer release is available:

    - name: Get installed Ollama version
      command: ollama --version
      register: ollama_version
      changed_when: false

    - name: Update Ollama
      shell: curl -fsSL https://ollama.com/install.sh | sh
      when: ollama_update | default(false) | bool
      args:
        executable: /bin/bash
      notify: Restart Ollama
      tags: update

Run the update task selectively with ansible-playbook deploy.yml --tags update -e ollama_update=true. The -e ollama_update=true extra variable overrides the default of false, triggering the update only when you explicitly pass it. Without the extra variable, the update task is skipped even when the update tag is specified, giving you fine-grained control over when updates are applied.

Verifying the Deployment

Add a verification task at the end of the playbook to confirm Ollama is responding and the expected models are available. The uri module can make a real chat request and validate the response:

    - name: Verify Ollama chat endpoint
      uri:
        url: "http://{{ ollama_host }}:{{ ollama_port }}/api/chat"
        method: POST
        body_format: json
        body:
          model: "{{ ollama_models[0] }}"
          messages:
            - role: user
              content: "Say OK"
          stream: false
        status_code: 200
      register: verify_result

    - name: Print verification result
      debug:
        msg: "Ollama responded: {{ verify_result.json.message.content }}"

This end-to-end verification confirms not just that the HTTP server is running, but that the model is loaded and can produce a response. Including this in your playbook means every deployment is automatically smoke-tested, and any issue with the model loading or the service configuration is caught immediately rather than discovered later when a user tries to use the system.

Using Variables and Vault for Secrets

If your Ollama setup requires any secrets — an API key for a reverse proxy, a basic auth password for an nginx gateway in front of Ollama, or a webhook token for notifications — store them in Ansible Vault rather than plaintext inventory files. Encrypt a variables file with ansible-vault create group_vars/ollama_servers/vault.yml, store your secrets inside it, and reference them in your playbook like any other variable. Commit the encrypted vault file to version control safely — without the vault password, the ciphertext is unreadable.

For teams managing multiple Ollama deployments across different environments — development, staging, production — use Ansible’s group_vars directory structure to keep environment-specific variables separate. Create group_vars/ollama_servers/ for shared settings, and override individual variables per host with host_vars/hostname/ files. This structure scales cleanly from a single server to dozens of hosts without the playbook itself needing to change.

Deploying Across Multiple Environments

One of the strongest reasons to use Ansible for Ollama deployments is managing multiple environments consistently. A development machine might run a small fast model for quick iteration, a staging server might mirror production settings exactly, and a production cluster might run larger models with stricter resource limits. By defining these differences in group variables and environment-specific inventory files, you can run the same playbook against any environment and get the correct configuration automatically.

For example, create separate inventory files — inventory/dev.ini, inventory/staging.ini, inventory/prod.ini — each pointing at the appropriate hosts. Store shared defaults in group_vars/ollama_servers/main.yml and environment overrides in host-specific variable files. Running ansible-playbook deploy.yml -i inventory/prod.ini then deploys the production configuration without touching any other environment. This separation of concerns is what makes Ansible playbooks genuinely reusable across a team rather than one-off automation scripts.

Monitoring and Health Checks

Add a dedicated health check task to your playbook that verifies the deployment end-to-end after every run. Beyond the uri module check already shown, you can use Ansible to register Ollama’s metrics endpoint with your monitoring stack. If your team uses Prometheus, the playbook can install and configure the Prometheus node exporter alongside Ollama, and add a scrape job to your Prometheus configuration pointing at the Ollama metrics endpoint on /metrics. This gives you CPU, GPU, and request latency data from day one of the deployment without any manual setup steps.

For simpler setups, a cron job that runs curl -sf http://localhost:11434/api/tags every minute and alerts on failure provides basic uptime monitoring with no additional tooling. Add the cron job with Ansible’s cron module to ensure it is consistent across all hosts and documented in version control alongside the rest of your infrastructure configuration.

Why Ansible for Ollama

Compared to running the install script by hand on each machine, an Ansible playbook gives you a complete record of exactly how every host was configured, stored in version control alongside your application code. When something goes wrong — a systemd service fails after a kernel update, a model pull corrupts partway through, a firewall rule gets out of sync — you can re-run the playbook to restore the expected state in minutes rather than debugging manually. For teams running Ollama across more than one or two machines, this reproducibility is the single most valuable thing Ansible brings to the workflow. The initial investment of writing a solid playbook pays back immediately the first time you need to provision a new GPU workstation or recover a misconfigured server.

The playbook in this guide is a solid foundation — extend it with role-based structure as your infrastructure grows, splitting installation, configuration, and model management into separate Ansible roles that can be composed and reused across different projects.