How to Use Ollama with Elixir and Phoenix

Ollama’s HTTP API works naturally from Elixir using the built-in Req or HTTPoison libraries. This guide covers integrating Ollama into Elixir and Phoenix applications, including streaming responses with Server-Sent Events and a clean module abstraction for AI calls.

Setup

# mix.exs
defp deps do
  [
    {:req, "~> 0.4"},        # HTTP client
    {:jason, "~> 1.4"}       # JSON
  ]
end

Basic Ollama Client Module

defmodule MyApp.Ollama do
  @base_url Application.compile_env(:my_app, :ollama_host, "http://localhost:11434")

  def chat(messages, model \\ "llama3.2", opts \\ []) do
    body = %{
      model: model,
      messages: messages,
      stream: false,
      options: %{temperature: Keyword.get(opts, :temperature, 0.7)}
    }
    case Req.post("#{@base_url}/api/chat", json: body) do
      {:ok, %{status: 200, body: %{"message" => %{"content" => content}}}} ->
        {:ok, content}
      {:ok, %{status: status, body: body}} ->
        {:error, "HTTP #{status}: #{inspect(body)}"}
      {:error, reason} ->
        {:error, reason}
    end
  end

  def embed(text, model \\ "nomic-embed-text") do
    case Req.post("#{@base_url}/api/embeddings", json: %{model: model, prompt: text}) do
      {:ok, %{status: 200, body: %{"embedding" => vector}}} -> {:ok, vector}
      {:error, reason} -> {:error, reason}
    end
  end
end

Using the Client

# Single turn
{:ok, reply} = MyApp.Ollama.chat([
  %{role: "user", content: "What is pattern matching in Elixir?"}
])
IO.puts(reply)

# Multi-turn conversation
messages = [
  %{role: "system", content: "You are a helpful Elixir tutor."},
  %{role: "user", content: "Explain GenServer in one paragraph."}
]
{:ok, reply} = MyApp.Ollama.chat(messages, "llama3.2")

Phoenix Controller

defmodule MyAppWeb.AiController do
  use MyAppWeb, :controller
  alias MyApp.Ollama

  def ask(conn, %{"message" => message}) do
    case Ollama.chat([%{role: "user", content: message}]) do
      {:ok, reply} -> json(conn, %{reply: reply})
      {:error, reason} -> conn |> put_status(503) |> json(%{error: reason})
    end
  end
end

Streaming with Phoenix LiveView

defmodule MyAppWeb.ChatLive do
  use MyAppWeb, :live_view

  def handle_event("send", %{"message" => message}, socket) do
    # Spawn async task for streaming
    pid = self()
    Task.start(fn ->
      body = Jason.encode!(%{model: "llama3.2",
        messages: [%{role: "user", content: message}], stream: true})
      {:ok, response} = Req.post("http://localhost:11434/api/chat",
        body: body, headers: [{"content-type", "application/json"}],
        into: fn {:data, chunk}, acc ->
          case Jason.decode(chunk) do
            {:ok, %{"message" => %{"content" => token}}} ->
              send(pid, {:token, token})
            _ -> :ok
          end
          {:cont, acc}
        end)
      send(pid, :done)
    end)
    {:noreply, assign(socket, response: "", streaming: true)}
  end

  def handle_info({:token, token}, socket) do
    {:noreply, assign(socket, response: socket.assigns.response <> token)}
  end
  def handle_info(:done, socket) do
    {:noreply, assign(socket, streaming: false)}
  end
end

Background Jobs with Oban

defmodule MyApp.Workers.SummariseWorker do
  use Oban.Worker, queue: :ai, max_attempts: 3
  alias MyApp.{Ollama, Documents}

  @impl Oban.Worker
  def perform(%Oban.Job{args: %{"document_id" => id}}) do
    document = Documents.get!(id)
    case Ollama.chat([
      %{role: "user", content: "Summarise in 3 bullets:\n\n#{document.content}"}
    ], "llama3.2", temperature: 0.3) do
      {:ok, summary} ->
        Documents.update(document, %{summary: summary})
        :ok
      {:error, reason} ->
        {:error, reason}  # Oban retries automatically
    end
  end
end

# Enqueue
%{document_id: doc.id} |> MyApp.Workers.SummariseWorker.new() |> Oban.insert()

Why Elixir for AI Integration

Elixir’s concurrency model — lightweight processes, message passing, and the OTP supervision tree — is particularly well-suited to AI workloads. Streaming LLM responses map naturally to Elixir’s async message-passing model: a spawned Task sends token messages to a LiveView process, which updates the UI without blocking. Background AI jobs fit naturally into Oban’s GenServer-based worker pool. And Elixir’s fault tolerance (supervisors restart crashed processes) means an AI worker that crashes due to an Ollama timeout or connection error recovers automatically without manual intervention.

The functional, pattern-matching style of Elixir also makes AI response handling clean. The {:ok, content} and {:error, reason} tuple convention provides clear error propagation throughout the call chain, and pattern matching on response bodies is more readable than nested conditionals in imperative languages. For teams already working in Elixir, Ollama integration requires no new paradigm — it is just another HTTP call with the same patterns you use everywhere else in the codebase.

Configuration

# config/config.exs
config :my_app, :ollama_host, "http://localhost:11434"
config :my_app, :ollama_model, "llama3.2"

# config/prod.exs — team server
config :my_app, :ollama_host, System.get_env("OLLAMA_HOST", "http://localhost:11434")

Testing Ollama Integration in Elixir

# test/my_app/ollama_test.exs
defmodule MyApp.OllamaTest do
  use ExUnit.Case, async: false

  # Mock using Req.Test for unit tests without real Ollama
  setup do
    Req.Test.stub(MyApp.Ollama, fn conn ->
      Plug.Conn.send_resp(conn, 200, Jason.encode!(%{
        message: %{role: "assistant", content: "Test response"}
      }))
    end)
    :ok
  end

  test "chat returns content on success" do
    assert {:ok, "Test response"} =
      MyApp.Ollama.chat([%{role: "user", content: "Hello"}])
  end
end

# For integration tests against real Ollama:
@tag :integration
test "chat with real llama3.2" do
  assert {:ok, response} = MyApp.Ollama.chat([%{role: "user", content: "Say OK"}])
  assert String.length(response) > 0
end

Elixir vs Other Language Integrations

Compared to Python, Elixir’s Ollama integration is lower-level — there is no Elixir equivalent of LangChain with built-in RAG, agents, and vector store integrations. This means you implement more yourself, but what you implement is idiomatic Elixir that fits naturally into your OTP application structure rather than a framework that imposes its own patterns. For Elixir teams, this is typically preferable — the existing OTP patterns for process supervision, message passing, and fault tolerance are well-understood and more appropriate for production Elixir applications than an AI framework designed for Python. Excellent hex.pm packages for vector similarity search (pgvector via Ecto, annoy) and HTTP streaming (Req) provide the building blocks for RAG and streaming without requiring a dedicated AI framework.

Getting Started

Add Req and Jason to your deps, copy the Ollama module from this article, configure the host in config.exs, and call MyApp.Ollama.chat/1 from a controller or LiveView. The pattern-matched response handling ensures errors surface clearly. Add the LiveView streaming pattern when you want real-time token display, and the Oban worker for background AI jobs that should not block user requests. The idiomatic Elixir patterns in this article fit naturally into any Phoenix application and require no framework knowledge beyond standard OTP — just Elixir and the libraries you likely already use.

Elixir and Phoenix are well-suited to AI-powered applications due to their strengths in concurrency, fault tolerance, and real-time features. Ollama’s HTTP API integrates naturally with Elixir’s Req or HTTPoison HTTP clients, and Phoenix LiveView enables streaming AI responses to the browser without JavaScript. This guide covers calling Ollama from Elixir and building a streaming chat interface with Phoenix LiveView.

HTTP Client Setup

# mix.exs dependencies
{:req, "~> 0.4"}  # Modern HTTP client
# or {:httpoison, "~> 2.0"}

Basic Chat Call

defmodule MyApp.AI do
  @base_url Application.compile_env(:my_app, :ollama_host, "http://localhost:11434")
  @model "llama3.2"

  def chat(message) do
    body = %{
      model: @model,
      messages: [%{role: "user", content: message}],
      stream: false
    }

    case Req.post("#{@base_url}/api/chat", json: body) do
      {:ok, %{status: 200, body: %{"message" => %{"content" => content}}}} ->
        {:ok, content}
      {:ok, %{status: status, body: body}} ->
        {:error, "HTTP #{status}: #{inspect(body)}"}
      {:error, reason} ->
        {:error, inspect(reason)}
    end
  end

  def embed(text) do
    case Req.post("#{@base_url}/api/embeddings",
           json: %{model: "nomic-embed-text", prompt: text}) do
      {:ok, %{status: 200, body: %{"embedding" => vector}}} -> {:ok, vector}
      error -> {:error, inspect(error)}
    end
  end
end

Phoenix LiveView Streaming Chat

# lib/my_app_web/live/chat_live.ex
defmodule MyAppWeb.ChatLive do
  use MyAppWeb, :live_view

  def mount(_params, _session, socket) do
    {:ok, assign(socket, messages: [], current_response: "", streaming: false)}
  end

  def handle_event("send", %{"message" => message}, socket) do
    # Start streaming in background task
    lv_pid = self()
    Task.start(fn ->
      stream_response(message, lv_pid)
    end)
    msgs = socket.assigns.messages ++ [%{role: "user", content: message}]
    {:noreply, assign(socket, messages: msgs, current_response: "", streaming: true)}
  end

  def handle_info({:token, token}, socket) do
    {:noreply, assign(socket, current_response: socket.assigns.current_response <> token)}
  end

  def handle_info(:done, socket) do
    final = socket.assigns.current_response
    msgs = socket.assigns.messages ++ [%{role: "assistant", content: final}]
    {:noreply, assign(socket, messages: msgs, current_response: "", streaming: false)}
  end

  defp stream_response(message, pid) do
    base_url = Application.get_env(:my_app, :ollama_host, "http://localhost:11434")
    body = Jason.encode!(%{model: "llama3.2",
      messages: [%{role: "user", content: message}], stream: true})

    Req.post!("#{base_url}/api/chat",
      body: body,
      headers: [{"content-type", "application/json"}],
      into: fn {:data, data}, acc ->
        data
        |> String.split("\n", trim: true)
        |> Enum.each(fn line ->
          case Jason.decode(line) do
            {:ok, %{"message" => %{"content" => token}}} when token != "" ->
              send(pid, {:token, token})
            _ -> :ok
          end
        end)
        {:cont, acc}
      end
    )
    send(pid, :done)
  end
end

Why Elixir + Ollama Works Well

Elixir’s actor model (via OTP processes) maps naturally to the async AI request pattern — each user’s streaming response runs in its own supervised process, isolated from other requests and automatically restarted on failure. Phoenix LiveView’s server-sent updates integrate directly with Elixir’s message passing: the streaming task sends {:token, token} messages to the LiveView process, which updates the UI incrementally without any JavaScript streaming logic. This is architecturally cleaner than most streaming implementations in other frameworks, where you need to bridge between HTTP streaming, WebSockets, and the application state manually.

Elixir’s built-in concurrency handles hundreds of simultaneous streaming connections efficiently — each LiveView connection is a lightweight Elixir process, and the BEAM VM schedules them without the thread pool limitations that affect Node.js or Python async frameworks under high concurrency. For real-time AI chat applications serving many simultaneous users, the Elixir + Phoenix + Ollama stack scales to very high concurrency on modest hardware.

GenServer for Conversation State

defmodule MyApp.ConversationServer do
  use GenServer

  def start_link(session_id) do
    GenServer.start_link(__MODULE__, [], name: via_tuple(session_id))
  end

  def chat(session_id, message) do
    GenServer.call(via_tuple(session_id), {:chat, message}, 60_000)
  end

  def clear(session_id) do
    GenServer.cast(via_tuple(session_id), :clear)
  end

  def init(_), do: {:ok, []}

  def handle_call({:chat, message}, _from, history) do
    updated = history ++ [%{role: "user", content: message}]
    {:ok, reply} = MyApp.AI.chat_with_history(updated)
    new_history = updated ++ [%{role: "assistant", content: reply}]
    {:reply, {:ok, reply}, Enum.take(new_history, -20)}
  end

  def handle_cast(:clear, _), do: {:noreply, []}

  defp via_tuple(id), do: {:via, Registry, {MyApp.Registry, id}}
end

Getting Started

Add Req to your mix.exs dependencies, pull llama3.2 in Ollama, and call MyApp.AI.chat/1 from an IEx session to verify the connection. Add the LiveView streaming implementation for a real-time chat interface that updates the browser as tokens arrive. The GenServer pattern provides persistent conversation history per user session without a database, suitable for development and small deployments. For production, persist conversation history to Ecto/PostgreSQL and look up conversations by session ID on GenServer startup.

Elixir vs Other Languages for AI Integration

Most languages integrate with Ollama through simple HTTP calls — the same basic pattern across Python, Ruby, Java, and Elixir. What distinguishes Elixir is what happens around those calls. The BEAM VM’s process model means each user’s AI session can be an isolated, supervised OTP process with its own conversation history, restarted automatically on crash without affecting other users. Phoenix LiveView’s server-rendered reactive UI delivers streaming AI responses to the browser with less client-side code than any other framework — no JavaScript streaming logic, no WebSocket management, just Elixir messages and HEEx templates. And Elixir’s pipe operator and pattern matching produce readable AI integration code that handles error paths explicitly rather than through try/catch chains.

For teams already in the Elixir ecosystem, Ollama integration requires no architectural compromises — the patterns fit naturally into OTP supervisors, GenServers, and Phoenix contexts. For teams evaluating Elixir, AI-powered real-time applications are one of the most compelling use cases for the language’s distinctive strengths: the concurrency model, the fault tolerance primitives, and the LiveView streaming capability combine to handle AI workloads more elegantly than most alternatives.

Testing Elixir AI Code

# test/my_app/ai_test.exs
defmodule MyApp.AITest do
  use ExUnit.Case, async: true

  import Mox

  # Define mock in test_helper.exs:
  # Mox.defmock(MyApp.MockHTTP, for: MyApp.HTTPBehaviour)

  test "chat returns content on success" do
    MyApp.MockHTTP
    |> expect(:post, fn _url, _opts ->
      {:ok, %{status: 200, body: %{"message" => %{"content" => "Hello there!"}}}}
    end)

    assert {:ok, "Hello there!"} = MyApp.AI.chat("Hi")
  end

  test "chat returns error on connection failure" do
    MyApp.MockHTTP
    |> expect(:post, fn _url, _opts -> {:error, :econnrefused} end)

    assert {:error, _} = MyApp.AI.chat("Hi")
  end
end

Deployment Considerations

Elixir applications are typically deployed as OTP releases — compiled, self-contained packages that include the BEAM runtime. Your Ollama integration configuration (ollama_host in config.exs) should be set via environment variables in production using config :my_app, ollama_host: System.get_env("OLLAMA_HOST", "http://localhost:11434") in runtime.exs. For Kubernetes deployments, the same Kubernetes Ollama manifests from this article series work alongside an Elixir application deployment — the application’s Kubernetes pods call the Ollama service via its cluster DNS name. Elixir’s built-in distributed capabilities mean you can also run multiple Elixir nodes in a cluster with all nodes sharing a single Ollama service endpoint, which is a natural fit for high-availability Phoenix deployments.

Elixir in the AI Ecosystem

Elixir is not the first language teams reach for when starting an AI project, but it is increasingly compelling for teams that need high-concurrency AI features in production. The combination of Phoenix LiveView for real-time UI, OTP for fault-tolerant session management, and the BEAM’s efficient process scheduling for concurrent inference requests creates a stack that handles AI workloads gracefully at scale. Projects like Nx (numerical computing for Elixir) and Bumblebee (Hugging Face model serving in Elixir) show that the ecosystem is building native AI capabilities beyond just calling external APIs. For teams committed to Elixir as a platform, Ollama integration is the most practical path to adding LLM capabilities today, and the patterns established will integrate naturally with native Elixir AI capabilities as they mature. The investment in learning OTP-based AI integration patterns pays dividends across both API-based and native approaches.

Nx and Bumblebee: Native Elixir AI

While this article focuses on calling Ollama via HTTP, it is worth understanding the broader Elixir AI ecosystem. Nx provides tensor operations and GPU acceleration for Elixir — think NumPy, but integrated with the BEAM. Bumblebee builds on Nx to run Hugging Face transformer models (BERT, Stable Diffusion, Whisper) natively in Elixir, converting GGUF/SafeTensors models to Elixir-native format and serving them via OTP processes. For teams that want native Elixir AI without an external Ollama dependency, Bumblebee is the alternative — but it requires more infrastructure setup (CUDA drivers, model conversion) compared to Ollama’s simple download-and-run model. The practical recommendation for most teams: start with Ollama for its simplicity and broad model support, and evaluate Bumblebee for specific use cases where native integration with Elixir’s concurrency primitives is worth the added setup complexity — high-throughput embedding generation and real-time speech transcription are two cases where Bumblebee’s native integration can outperform the HTTP round-trip to Ollama.

Choosing Elixir for AI-Powered Applications

Elixir is not the default choice for AI integration, but it excels in specific scenarios where its strengths align with AI application requirements. Real-time collaborative AI features — shared documents with AI suggestions, live coding assistants, multiplayer brainstorming tools — benefit from Phoenix’s PubSub and LiveView capabilities in ways that are difficult to replicate cleanly in other frameworks. High-concurrency inference routing, where thousands of simultaneous users each need a managed conversation session, maps naturally to OTP’s supervised process tree. And fault-tolerant pipeline processing, where a crashed model call should not affect other users, is handled automatically by OTP’s restart strategies without any application-level try/catch logic. For teams whose use cases align with these strengths, Elixir + Ollama is a genuinely superior stack — not just a workable one. Start with the simple HTTP client and LiveView streaming examples from this article, evaluate on your actual use case, and reach for OTP’s deeper features (GenServer conversation management, supervised task trees, distributed Erlang clustering) as your requirements grow.

Getting Started

Add Req to your mix.exs, pull llama3.2 in Ollama, and call MyApp.AI.chat("Hello") from IEx to verify the connection. Add the Phoenix LiveView streaming module for real-time token display — the LiveView approach is architecturally the cleanest streaming implementation available in any web framework, and it takes less code than equivalent JavaScript-heavy implementations. From there, add the GenServer conversation manager for persistent multi-turn sessions, configure Ollama host via application environment for deployment flexibility, and write Mox-based tests for the AI service layer. The full stack — Elixir, Phoenix, OTP, Ollama — is production-ready and handles real-time AI workloads with fewer moving parts than most alternative stacks — a compelling combination for teams that value both developer ergonomics and operational reliability in production AI systems — an advantage that compounds as your application matures and operational requirements grow.

The investment in Elixir’s patterns — OTP supervision, GenServer state, LiveView reactivity — pays off most clearly in AI applications that need to be both responsive and reliable. Where other stacks add complexity to achieve the same guarantees, Elixir provides them as the default operating model.

Elixir’s combination of fault tolerance, concurrency, and Phoenix LiveView streaming makes it one of the most architecturally elegant options for AI-powered real-time web applications. Teams willing to invest in the Elixir learning curve will find it pays back quickly for this class of application.

Setup

Basic Ollama Client Module

Using the Client

Phoenix Controller

Streaming with Phoenix LiveView

Background Jobs with Oban

Why Elixir for AI Integration

Configuration

Testing Ollama Integration in Elixir

Elixir vs Other Language Integrations

Getting Started

HTTP Client Setup

Basic Chat Call

Phoenix LiveView Streaming Chat

Why Elixir + Ollama Works Well

GenServer for Conversation State

Getting Started

Elixir vs Other Languages for AI Integration

Testing Elixir AI Code

Deployment Considerations

Elixir in the AI Ecosystem

Nx and Bumblebee: Native Elixir AI

Choosing Elixir for AI-Powered Applications

Getting Started

Leave a Comment Cancel reply