How to Use Ollama with Ruby and Rails

Ruby and Rails applications can integrate Ollama using the ruby-ollama gem or direct HTTP calls via Faraday. This guide covers both approaches with practical examples for Rails APIs and background jobs.

Option 1: ruby-ollama gem

# Gemfile
gem 'ruby-ollama'

bundle install
require 'ruby_ollama'

client = RubyOllama::Client.new(host: 'http://localhost:11434')

# Chat
response = client.chat(
  model: 'llama3.2',
  messages: [{ role: 'user', content: 'Why is Ruby popular?' }]
)
puts response.message.content

# Generate
response = client.generate(
  model: 'llama3.2',
  prompt: 'Write a haiku about Ruby',
  stream: false
)
puts response.response

# Embeddings
response = client.embeddings(
  model: 'nomic-embed-text',
  prompt: 'The quick brown fox'
)
puts response.embedding.length  # 768

Option 2: Direct HTTP with Faraday

require 'faraday'
require 'json'

class OllamaClient
  BASE_URL = 'http://localhost:11434'

  def chat(model:, message:, temperature: 0.7)
    conn = Faraday.new(url: BASE_URL) do |f|
      f.request :json
      f.response :json
    end

    response = conn.post('/api/chat') do |req|
      req.body = {
        model: model,
        messages: [{ role: 'user', content: message }],
        stream: false,
        options: { temperature: temperature }
      }
    end
    response.body.dig('message', 'content')
  end

  def embed(model: 'nomic-embed-text', text:)
    conn = Faraday.new(url: BASE_URL) do |f|
      f.request :json
      f.response :json
    end
    response = conn.post('/api/embeddings') { |r| r.body = { model: model, prompt: text } }
    response.body['embedding']
  end
end

client = OllamaClient.new
puts client.chat(model: 'llama3.2', message: 'Hello from Ruby!')

Rails Service Object

# app/services/ai_service.rb
class AiService
  def initialize(model: 'llama3.2')
    @model = model
    @client = OllamaClient.new
  end

  def summarise(text, bullet_points: 3)
    prompt = "Summarise in #{bullet_points} bullet points:\n\n#{text}"
    @client.chat(model: @model, message: prompt, temperature: 0.3)
  end

  def classify(text, categories:)
    prompt = "Classify this text into one of: #{categories.join(', ')}.\nReply with only the category name.\n\n#{text}"
    @client.chat(model: @model, message: prompt, temperature: 0.0).strip
  end
end

# In a controller:
def create
  service = AiService.new
  summary = service.summarise(params[:document_text])
  render json: { summary: summary }
end

Sidekiq Background Jobs

# app/jobs/document_summary_job.rb
class DocumentSummaryJob
  include Sidekiq::Job
  sidekiq_options retry: 3

  def perform(document_id)
    document = Document.find(document_id)
    service = AiService.new
    summary = service.summarise(document.content)
    document.update!(summary: summary, processed_at: Time.current)
  rescue => e
    Rails.logger.error "Summary failed for #{document_id}: #{e.message}"
    raise
  end
end

# Enqueue from controller:
DocumentSummaryJob.perform_async(document.id)

Why Ruby for Ollama Integration

Ruby’s expressive syntax and rich ecosystem of web frameworks make it a natural fit for AI-powered web applications. Rails applications can add Ollama-powered features — document summarisation, content classification, intelligent search, response drafting — as service objects and background jobs that integrate naturally with ActiveRecord, ActiveJob, and the Rails request lifecycle. The patterns in this article follow Ruby idioms and Rails conventions, making them easy to adopt for teams already working in the ecosystem.

The ruby-ollama gem provides a clean, idiomatic Ruby API that feels natural alongside other Ruby HTTP clients. For teams that prefer not to add a gem dependency, the Faraday-based approach demonstrates how straightforward the Ollama REST API is to call from any HTTP library — the request and response shapes are simple enough that a custom client is well within reach. Both approaches produce identical results and the choice comes down to team preference for gems versus custom code in the application’s service layer.

Streaming in Rails with ActionController::Live

class AiController < ApplicationController
  include ActionController::Live

  def stream_response
    response.headers['Content-Type'] = 'text/plain; charset=utf-8'
    response.headers['Cache-Control'] = 'no-cache'

    conn = Faraday.new(url: 'http://localhost:11434')
    body = {
      model: 'llama3.2',
      messages: [{ role: 'user', content: params[:message] }],
      stream: true
    }.to_json

    conn.post('/api/chat', body, 'Content-Type' => 'application/json') do |req|
      req.options.on_data = Proc.new do |chunk, _|
        data = JSON.parse(chunk) rescue next
        content = data.dig('message', 'content')
        response.stream.write(content) if content
      end
    end
  ensure
    response.stream.close
  end
end

Structured Output in Ruby

require 'json'

def extract_contact(text, client:)
  schema = {
    type: 'object',
    properties: {
      name: { type: 'string' },
      email: { type: 'string' },
      phone: { type: 'string' }
    },
    required: %w[name email]
  }

  conn = Faraday.new(url: 'http://localhost:11434') do |f|
    f.request :json
    f.response :json
  end

  response = conn.post('/api/chat') do |req|
    req.body = {
      model: 'llama3.2',
      messages: [{ role: 'user', content: "Extract contact from: #{text}" }],
      format: schema,
      stream: false,
      options: { temperature: 0 }
    }
  end
  JSON.parse(response.body.dig('message', 'content'))
end

contact = extract_contact('Call Alice at alice@example.com or 555-1234', client: nil)
puts contact['name']  # 'Alice'
puts contact['email'] # 'alice@example.com'

Testing Ruby Ollama Code

# spec/services/ai_service_spec.rb
RSpec.describe AiService do
  let(:service) { described_class.new }

  describe '#summarise' do
    it 'returns a summary string' do
      # Use VCR or WebMock to stub Ollama HTTP calls in tests
      stub_request(:post, 'http://localhost:11434/api/chat')
        .to_return(
          status: 200,
          body: { message: { content: 'Summary: Key points here.' }, done: true }.to_json,
          headers: { 'Content-Type' => 'application/json' }
        )
      result = service.summarise('Long document text...')
      expect(result).to include('Summary')
    end
  end
end

Getting Started

Add the Faraday gem (already in most Rails apps) or ruby-ollama to your Gemfile, create an OllamaClient service class, and use it in a controller action or ActiveJob. The background job pattern with Sidekiq is the most practical approach for document processing and batch operations — it follows the same conventions as any other Sidekiq job in a Rails application and benefits from Sidekiq’s retry logic, monitoring UI, and concurrency management. Start with the synchronous service object for interactive features and add the Sidekiq job when you need async processing for longer-running inference tasks.

Ollama’s OpenAI-compatible REST API works with any Ruby HTTP client. The ruby-openai gem (pointed at Ollama’s /v1 endpoint) is the most convenient option, and the community ruby-ollama gem provides Ollama-native API access. This guide covers both approaches for integrating Ollama into Ruby and Rails applications.

Option 1: ruby-openai Gem (Recommended)

gem install ruby-openai
# or in Gemfile: gem 'ruby-openai'
require 'openai'

client = OpenAI::Client.new(
  access_token: 'ollama',  # any non-empty string
  uri_base: 'http://localhost:11434/v1'
)

# Chat
response = client.chat(
  parameters: {
    model: 'llama3.2',
    messages: [{ role: 'user', content: 'Why is Ruby elegant?' }]
  }
)
puts response.dig('choices', 0, 'message', 'content')

# Streaming
client.chat(
  parameters: {
    model: 'llama3.2',
    messages: [{ role: 'user', content: 'Count to 5.' }],
    stream: proc do |chunk, _bytesize|
      print chunk.dig('choices', 0, 'delta', 'content')
    end
  }
)

Embeddings

response = client.embeddings(
  parameters: {
    model: 'nomic-embed-text',
    input: 'The quick brown fox'
  }
)
vector = response['data'][0]['embedding']
puts "Dimensions: #{vector.length}"

Rails Service Object

# app/services/ai_service.rb
class AiService
  def initialize
    @client = OpenAI::Client.new(
      access_token: 'ollama',
      uri_base: ENV.fetch('OLLAMA_HOST', 'http://localhost:11434') + '/v1'
    )
  end

  def summarise(text, model: 'llama3.2')
    response = @client.chat(
      parameters: {
        model: model,
        messages: [{
          role: 'user',
          content: "Summarise in 3 bullet points:\n\n#{text}"
        }],
        temperature: 0.3
      }
    )
    response.dig('choices', 0, 'message', 'content')
  end

  def classify(text)
    response = @client.chat(
      parameters: {
        model: 'llama3.2',
        messages: [{
          role: 'user',
          content: "Classify as: invoice, contract, email, or other. Reply with JSON only.\n\n#{text[0..1000]}"
        }],
        temperature: 0
      }
    )
    JSON.parse(response.dig('choices', 0, 'message', 'content'))
  end
end

# Usage in a controller
class DocumentsController < ApplicationController
  def analyse
    ai = AiService.new
    @summary = ai.summarise(params[:text])
    @classification = ai.classify(params[:text])
    render json: { summary: @summary, classification: @classification }
  end
end

Background Jobs with ActiveJob

# app/jobs/summarise_document_job.rb
class SummariseDocumentJob < ApplicationJob
  queue_as :default

  def perform(document_id)
    document = Document.find(document_id)
    ai = AiService.new
    summary = ai.summarise(document.content)
    document.update!(summary: summary, summarised_at: Time.current)
  end
end

# Enqueue from controller
SummariseDocumentJob.perform_later(document.id)

Rails Credentials for Configuration

# config/credentials.yml.enc (edit with: rails credentials:edit)
# ollama:
#   host: http://localhost:11434

# Access in code:
ollama_host = Rails.application.credentials.dig(:ollama, :host)

Why Ruby for AI Integration

Ruby's expressive syntax and the Rails ecosystem's conventions make AI integration feel natural — service objects, background jobs, and controller concerns are the established patterns for encapsulating external service calls, and Ollama fits cleanly into all three. The ruby-openai gem's OpenAI-compatible interface means any team already using OpenAI in a Rails application can switch to Ollama by changing two lines: the access_token and uri_base initialisation arguments. This drop-in replacement story is one of the most compelling aspects of Ollama's OpenAI-compatible API for Ruby teams — no new gem, no new API patterns, just updated configuration.

Streaming in Rails with ActionController::Live

class AiController < ApplicationController
  include ActionController::Live

  def stream
    response.headers['Content-Type'] = 'text/event-stream'
    response.headers['Cache-Control'] = 'no-cache'

    client = OpenAI::Client.new(access_token: 'ollama',
                                uri_base: 'http://localhost:11434/v1')
    client.chat(
      parameters: {
        model: 'llama3.2',
        messages: [{ role: 'user', content: params[:message] }],
        stream: proc do |chunk, _bytesize|
          token = chunk.dig('choices', 0, 'delta', 'content')
          response.stream.write("data: #{token.to_json}\n\n") if token
        end
      }
    )
  ensure
    response.stream.close
  end
end

Caching AI Responses

class AiService
  def summarise_with_cache(text, model: 'llama3.2')
    cache_key = "ai/summary/#{Digest::SHA256.hexdigest(text + model)}"
    Rails.cache.fetch(cache_key, expires_in: 1.day) do
      summarise(text, model: model)  # Only calls Ollama on cache miss
    end
  end
end

Testing Rails AI Services

# spec/services/ai_service_spec.rb
RSpec.describe AiService do
  describe '#summarise' do
    it 'returns a summary string' do
      # Stub the OpenAI client to avoid real Ollama calls in tests
      stub_response = {
        'choices' => [{ 'message' => { 'content' => '• Point 1\n• Point 2\n• Point 3' } }]
      }
      allow_any_instance_of(OpenAI::Client).to receive(:chat).and_return(stub_response)

      service = AiService.new
      result = service.summarise('Some document text')
      expect(result).to include('Point 1')
    end
  end
end

Getting Started

Add gem 'ruby-openai' to your Gemfile, configure the client to point at http://localhost:11434/v1 with access_token: 'ollama', and call client.chat with your model and messages. The AiService service object pattern from this article provides a clean encapsulation layer that makes swapping models, adding caching, and writing tests straightforward. For Rails applications already using OpenAI, the migration to Ollama is a configuration change — the rest of your application code, controllers, jobs, and tests remain unchanged. Pull your chosen model with ollama pull llama3.2, update the uri_base, and your AI features are running locally with no API costs and no data leaving your infrastructure.

The ruby-openai Gem vs Ruby-Native Ollama Gems

Two main options exist for Ollama integration in Ruby: the ruby-openai gem (pointing at Ollama's OpenAI-compatible /v1 endpoint) and purpose-built gems like ruby-ollama or ollama-ai that wrap the native Ollama API. The ruby-openai approach is recommended for most Rails teams for one key reason: it is the same gem you would use if you switched to OpenAI, Anthropic's API, or another OpenAI-compatible provider. The interface is identical regardless of backend — switching from Ollama to OpenAI is a two-line configuration change, and vice versa. Purpose-built Ollama gems have access to Ollama-native features (model management, Modelfile creation, streaming with the native format) but at the cost of a non-portable interface that requires changes if you ever switch backends. For typical Rails AI features — chat, summarisation, classification, embeddings — the OpenAI-compatible interface is fully sufficient.

Handling Long-Running Requests with Rack Timeout

Rails applications typically configure Rack::Timeout to kill requests that exceed a certain duration (commonly 15–30 seconds). LLM inference can easily exceed this limit for long responses. Configure Rack::Timeout to allow longer timeouts for AI endpoints specifically, or use the background job pattern (ActiveJob) to move inference out of the request-response cycle entirely:

# config/initializers/rack_timeout.rb
# Increase global timeout for AI-heavy applications
Rack::Timeout.timeout = 120  # 2 minutes

# Or use per-request timeout via middleware:
Rack::Timeout::StateManager.timeout_for_request do |env|
  env['PATH_INFO'].start_with?('/ai') ? 300 : 15
end

Environment-Based Configuration

# config/initializers/ai.rb
AI_CLIENT = OpenAI::Client.new(
  access_token: ENV.fetch('OLLAMA_API_KEY', 'ollama'),
  uri_base: ENV.fetch('OLLAMA_HOST', 'http://localhost:11434') + '/v1',
  request_timeout: ENV.fetch('OLLAMA_TIMEOUT', '120').to_i
)

# config/environments/production.rb — point at team server
# OLLAMA_HOST=http://ai-server.internal:11434

# config/environments/test.rb — stub in tests
# (no real Ollama calls in test environment)

Ruby AI Beyond Rails

The ruby-openai gem works in any Ruby context — Sinatra, Rack apps, plain Ruby scripts, and Jekyll plugins. For non-Rails Ruby projects, the same client initialisation and chat/embedding API calls work identically. A Ruby script for batch document processing is a common simple use case: read files, call ollama chat for each, write results — no framework needed, just the gem and a few dozen lines of Ruby. The simplicity of the HTTP API combined with Ruby's clean syntax makes this kind of one-off AI processing script one of the fastest things to write in the language.

Performance Considerations for Ruby

Ruby's GIL (Global Interpreter Lock) means MRI Ruby cannot run truly parallel threads, but this does not significantly affect Ollama integration because the bottleneck is always the inference step, not the Ruby code. For concurrent AI requests in Rails, use Puma with multiple workers (processes, not threads) and ensure each Puma worker has its own AI client instance. For high-throughput batch processing outside Rails, use the Parallel gem or plain Ruby processes (via Process.fork) to process documents in parallel, with each process making its own sequential Ollama calls. Jruby (which lacks the GIL) enables true thread-based parallelism for I/O-bound tasks, though the Ollama calls are I/O-bound and benefit from concurrent threading even in MRI Ruby via Puma's threaded mode.

The Ruby AI Ecosystem in 2026

Ruby's AI integration ecosystem has matured considerably. The ruby-openai gem is actively maintained and covers the full OpenAI API including assistants, fine-tuning, and moderation in addition to chat and embeddings. Community gems for LangChain-style orchestration in Ruby (langchainrb) provide chains, agents, and vector store integrations following patterns familiar from the Python ecosystem. For Rails teams building AI features, the combination of ruby-openai for model calls, ActiveJob for async processing, Rails.cache for response caching, and LangChainrb for complex orchestration covers the vast majority of AI application requirements without leaving the Ruby ecosystem.

Ollama's OpenAI-compatible API makes Ruby the language of least resistance for teams already invested in Rails. No new patterns to learn, no separate service to deploy, no context-switching to Python for AI features — just a new gem pointed at a local endpoint. For Rails shops that have been waiting for the right moment to add AI capabilities to their applications, the combination of Ollama's local inference and the mature Ruby AI gem ecosystem makes 2026 that moment.

Migrating from OpenAI to Ollama in Rails

For Rails applications already calling the OpenAI API via ruby-openai, migrating to Ollama is straightforward in most cases. Update the client initialisation to point at your Ollama endpoint, change the model names (from gpt-4o to llama3.2 or your preferred model), and test. Chat and embedding calls use identical method signatures — the response structure is the same OpenAI-compatible JSON. The areas that need attention: Ollama's model names do not map one-to-one with OpenAI model names, so prompts optimised for GPT-4o may need adjustment for a different model family; tool calling (function calling) support varies by model, so verify your chosen model supports it if you use tools; and image inputs require a vision-capable model like LLaVA or Gemma 3 rather than any model. Beyond these three areas, the migration is typically a configuration change, not a code change — which is the central practical value of Ollama's OpenAI-compatible API for Ruby teams.

Practical Starting Point

Add the gem, configure the client pointing at http://localhost:11434/v1, write the AiService class from this article, and call it from one controller action or background job. Evaluate the output quality on your actual use case with the model you have chosen — run 10–20 real examples through it and compare to your quality bar. If the quality is acceptable, expand the integration; if not, try a larger or different model before spending more time on the integration. The cost of evaluating local Ollama for your Rails application is under an hour, and the potential savings in API costs and privacy improvements make that hour one of the best-spent in any AI-integration project.

A Note on Model Quality in Ruby Applications

The model you choose matters more than the integration library. A well-prompted llama3.2 7B handles most common Rails AI tasks — document classification, summarisation, simple Q&A — with quality that is good enough for production use in internal tools and non-critical customer features. For customer-facing features where quality is paramount, evaluate larger models (13B, 32B) or cloud models alongside the local option before committing to a production architecture. The ruby-openai gem makes this evaluation easy: switching between Ollama's llama3.2, a locally-hosted Qwen 32B, and OpenAI's GPT-4o requires only changing the model name and uri_base — your service object code is identical across all three. Run the same 20–30 representative examples through each option and let your quality requirements drive the choice rather than defaulting to either local or cloud — the evaluation cost is an hour, and the data you get from running real examples is far more informative than any benchmark or comparison article for your specific use case and your production requirements far better than any general comparison can — and the investment in evaluation is almost always returned within the first week of production usage through either cost savings, privacy improvements, or both — a straightforward return on a minimal upfront investment that every Ruby team working with AI should make before committing to any architecture decision about their AI backend.

The model-agnostic interface that ruby-openai provides is ultimately what makes local AI integration in Ruby so low-friction — you evaluate on Ollama, and if you ever need to switch to a cloud model for quality reasons, the gem handles it without touching your application code.

That shift from prototype to production without touching application code is the most practical advantage the library abstraction provides — and it compounds every time you add a new feature or scale a new model to a new use case.

Leave a Comment