Ruby and Rails applications can integrate Ollama using the ruby-ollama gem or direct HTTP calls via Faraday. This guide covers both approaches with practical examples for Rails APIs and background jobs.
Option 1: ruby-ollama gem
# Gemfile
gem 'ruby-ollama'
bundle install
require 'ruby_ollama'
client = RubyOllama::Client.new(host: 'http://localhost:11434')
# Chat
response = client.chat(
model: 'llama3.2',
messages: [{ role: 'user', content: 'Why is Ruby popular?' }]
)
puts response.message.content
# Generate
response = client.generate(
model: 'llama3.2',
prompt: 'Write a haiku about Ruby',
stream: false
)
puts response.response
# Embeddings
response = client.embeddings(
model: 'nomic-embed-text',
prompt: 'The quick brown fox'
)
puts response.embedding.length # 768
Option 2: Direct HTTP with Faraday
require 'faraday'
require 'json'
class OllamaClient
BASE_URL = 'http://localhost:11434'
def chat(model:, message:, temperature: 0.7)
conn = Faraday.new(url: BASE_URL) do |f|
f.request :json
f.response :json
end
response = conn.post('/api/chat') do |req|
req.body = {
model: model,
messages: [{ role: 'user', content: message }],
stream: false,
options: { temperature: temperature }
}
end
response.body.dig('message', 'content')
end
def embed(model: 'nomic-embed-text', text:)
conn = Faraday.new(url: BASE_URL) do |f|
f.request :json
f.response :json
end
response = conn.post('/api/embeddings') { |r| r.body = { model: model, prompt: text } }
response.body['embedding']
end
end
client = OllamaClient.new
puts client.chat(model: 'llama3.2', message: 'Hello from Ruby!')
Rails Service Object
# app/services/ai_service.rb
class AiService
def initialize(model: 'llama3.2')
@model = model
@client = OllamaClient.new
end
def summarise(text, bullet_points: 3)
prompt = "Summarise in #{bullet_points} bullet points:\n\n#{text}"
@client.chat(model: @model, message: prompt, temperature: 0.3)
end
def classify(text, categories:)
prompt = "Classify this text into one of: #{categories.join(', ')}.\nReply with only the category name.\n\n#{text}"
@client.chat(model: @model, message: prompt, temperature: 0.0).strip
end
end
# In a controller:
def create
service = AiService.new
summary = service.summarise(params[:document_text])
render json: { summary: summary }
end
Sidekiq Background Jobs
# app/jobs/document_summary_job.rb
class DocumentSummaryJob
include Sidekiq::Job
sidekiq_options retry: 3
def perform(document_id)
document = Document.find(document_id)
service = AiService.new
summary = service.summarise(document.content)
document.update!(summary: summary, processed_at: Time.current)
rescue => e
Rails.logger.error "Summary failed for #{document_id}: #{e.message}"
raise
end
end
# Enqueue from controller:
DocumentSummaryJob.perform_async(document.id)
Why Ruby for Ollama Integration
Ruby’s expressive syntax and rich ecosystem of web frameworks make it a natural fit for AI-powered web applications. Rails applications can add Ollama-powered features — document summarisation, content classification, intelligent search, response drafting — as service objects and background jobs that integrate naturally with ActiveRecord, ActiveJob, and the Rails request lifecycle. The patterns in this article follow Ruby idioms and Rails conventions, making them easy to adopt for teams already working in the ecosystem.
The ruby-ollama gem provides a clean, idiomatic Ruby API that feels natural alongside other Ruby HTTP clients. For teams that prefer not to add a gem dependency, the Faraday-based approach demonstrates how straightforward the Ollama REST API is to call from any HTTP library — the request and response shapes are simple enough that a custom client is well within reach. Both approaches produce identical results and the choice comes down to team preference for gems versus custom code in the application’s service layer.
Streaming in Rails with ActionController::Live
class AiController < ApplicationController
include ActionController::Live
def stream_response
response.headers['Content-Type'] = 'text/plain; charset=utf-8'
response.headers['Cache-Control'] = 'no-cache'
conn = Faraday.new(url: 'http://localhost:11434')
body = {
model: 'llama3.2',
messages: [{ role: 'user', content: params[:message] }],
stream: true
}.to_json
conn.post('/api/chat', body, 'Content-Type' => 'application/json') do |req|
req.options.on_data = Proc.new do |chunk, _|
data = JSON.parse(chunk) rescue next
content = data.dig('message', 'content')
response.stream.write(content) if content
end
end
ensure
response.stream.close
end
end
Structured Output in Ruby
require 'json'
def extract_contact(text, client:)
schema = {
type: 'object',
properties: {
name: { type: 'string' },
email: { type: 'string' },
phone: { type: 'string' }
},
required: %w[name email]
}
conn = Faraday.new(url: 'http://localhost:11434') do |f|
f.request :json
f.response :json
end
response = conn.post('/api/chat') do |req|
req.body = {
model: 'llama3.2',
messages: [{ role: 'user', content: "Extract contact from: #{text}" }],
format: schema,
stream: false,
options: { temperature: 0 }
}
end
JSON.parse(response.body.dig('message', 'content'))
end
contact = extract_contact('Call Alice at alice@example.com or 555-1234', client: nil)
puts contact['name'] # 'Alice'
puts contact['email'] # 'alice@example.com'
Testing Ruby Ollama Code
# spec/services/ai_service_spec.rb
RSpec.describe AiService do
let(:service) { described_class.new }
describe '#summarise' do
it 'returns a summary string' do
# Use VCR or WebMock to stub Ollama HTTP calls in tests
stub_request(:post, 'http://localhost:11434/api/chat')
.to_return(
status: 200,
body: { message: { content: 'Summary: Key points here.' }, done: true }.to_json,
headers: { 'Content-Type' => 'application/json' }
)
result = service.summarise('Long document text...')
expect(result).to include('Summary')
end
end
end
Getting Started
Add the Faraday gem (already in most Rails apps) or ruby-ollama to your Gemfile, create an OllamaClient service class, and use it in a controller action or ActiveJob. The background job pattern with Sidekiq is the most practical approach for document processing and batch operations — it follows the same conventions as any other Sidekiq job in a Rails application and benefits from Sidekiq’s retry logic, monitoring UI, and concurrency management. Start with the synchronous service object for interactive features and add the Sidekiq job when you need async processing for longer-running inference tasks.
Ollama’s OpenAI-compatible REST API works with any Ruby HTTP client. The ruby-openai gem (pointed at Ollama’s /v1 endpoint) is the most convenient option, and the community ruby-ollama gem provides Ollama-native API access. This guide covers both approaches for integrating Ollama into Ruby and Rails applications.
Option 1: ruby-openai Gem (Recommended)
gem install ruby-openai
# or in Gemfile: gem 'ruby-openai'
require 'openai'
client = OpenAI::Client.new(
access_token: 'ollama', # any non-empty string
uri_base: 'http://localhost:11434/v1'
)
# Chat
response = client.chat(
parameters: {
model: 'llama3.2',
messages: [{ role: 'user', content: 'Why is Ruby elegant?' }]
}
)
puts response.dig('choices', 0, 'message', 'content')
# Streaming
client.chat(
parameters: {
model: 'llama3.2',
messages: [{ role: 'user', content: 'Count to 5.' }],
stream: proc do |chunk, _bytesize|
print chunk.dig('choices', 0, 'delta', 'content')
end
}
)
Embeddings
response = client.embeddings(
parameters: {
model: 'nomic-embed-text',
input: 'The quick brown fox'
}
)
vector = response['data'][0]['embedding']
puts "Dimensions: #{vector.length}"
Rails Service Object
# app/services/ai_service.rb
class AiService
def initialize
@client = OpenAI::Client.new(
access_token: 'ollama',
uri_base: ENV.fetch('OLLAMA_HOST', 'http://localhost:11434') + '/v1'
)
end
def summarise(text, model: 'llama3.2')
response = @client.chat(
parameters: {
model: model,
messages: [{
role: 'user',
content: "Summarise in 3 bullet points:\n\n#{text}"
}],
temperature: 0.3
}
)
response.dig('choices', 0, 'message', 'content')
end
def classify(text)
response = @client.chat(
parameters: {
model: 'llama3.2',
messages: [{
role: 'user',
content: "Classify as: invoice, contract, email, or other. Reply with JSON only.\n\n#{text[0..1000]}"
}],
temperature: 0
}
)
JSON.parse(response.dig('choices', 0, 'message', 'content'))
end
end
# Usage in a controller
class DocumentsController < ApplicationController
def analyse
ai = AiService.new
@summary = ai.summarise(params[:text])
@classification = ai.classify(params[:text])
render json: { summary: @summary, classification: @classification }
end
end
Background Jobs with ActiveJob
# app/jobs/summarise_document_job.rb
class SummariseDocumentJob < ApplicationJob
queue_as :default
def perform(document_id)
document = Document.find(document_id)
ai = AiService.new
summary = ai.summarise(document.content)
document.update!(summary: summary, summarised_at: Time.current)
end
end
# Enqueue from controller
SummariseDocumentJob.perform_later(document.id)
Rails Credentials for Configuration
# config/credentials.yml.enc (edit with: rails credentials:edit)
# ollama:
# host: http://localhost:11434
# Access in code:
ollama_host = Rails.application.credentials.dig(:ollama, :host)
Why Ruby for AI Integration
Ruby's expressive syntax and the Rails ecosystem's conventions make AI integration feel natural — service objects, background jobs, and controller concerns are the established patterns for encapsulating external service calls, and Ollama fits cleanly into all three. The ruby-openai gem's OpenAI-compatible interface means any team already using OpenAI in a Rails application can switch to Ollama by changing two lines: the access_token and uri_base initialisation arguments. This drop-in replacement story is one of the most compelling aspects of Ollama's OpenAI-compatible API for Ruby teams — no new gem, no new API patterns, just updated configuration.
Streaming in Rails with ActionController::Live
class AiController < ApplicationController
include ActionController::Live
def stream
response.headers['Content-Type'] = 'text/event-stream'
response.headers['Cache-Control'] = 'no-cache'
client = OpenAI::Client.new(access_token: 'ollama',
uri_base: 'http://localhost:11434/v1')
client.chat(
parameters: {
model: 'llama3.2',
messages: [{ role: 'user', content: params[:message] }],
stream: proc do |chunk, _bytesize|
token = chunk.dig('choices', 0, 'delta', 'content')
response.stream.write("data: #{token.to_json}\n\n") if token
end
}
)
ensure
response.stream.close
end
end
Caching AI Responses
class AiService
def summarise_with_cache(text, model: 'llama3.2')
cache_key = "ai/summary/#{Digest::SHA256.hexdigest(text + model)}"
Rails.cache.fetch(cache_key, expires_in: 1.day) do
summarise(text, model: model) # Only calls Ollama on cache miss
end
end
end
Testing Rails AI Services
# spec/services/ai_service_spec.rb
RSpec.describe AiService do
describe '#summarise' do
it 'returns a summary string' do
# Stub the OpenAI client to avoid real Ollama calls in tests
stub_response = {
'choices' => [{ 'message' => { 'content' => '• Point 1\n• Point 2\n• Point 3' } }]
}
allow_any_instance_of(OpenAI::Client).to receive(:chat).and_return(stub_response)
service = AiService.new
result = service.summarise('Some document text')
expect(result).to include('Point 1')
end
end
end
Getting Started
Add gem 'ruby-openai' to your Gemfile, configure the client to point at http://localhost:11434/v1 with access_token: 'ollama', and call client.chat with your model and messages. The AiService service object pattern from this article provides a clean encapsulation layer that makes swapping models, adding caching, and writing tests straightforward. For Rails applications already using OpenAI, the migration to Ollama is a configuration change — the rest of your application code, controllers, jobs, and tests remain unchanged. Pull your chosen model with ollama pull llama3.2, update the uri_base, and your AI features are running locally with no API costs and no data leaving your infrastructure.
The ruby-openai Gem vs Ruby-Native Ollama Gems
Two main options exist for Ollama integration in Ruby: the ruby-openai gem (pointing at Ollama's OpenAI-compatible /v1 endpoint) and purpose-built gems like ruby-ollama or ollama-ai that wrap the native Ollama API. The ruby-openai approach is recommended for most Rails teams for one key reason: it is the same gem you would use if you switched to OpenAI, Anthropic's API, or another OpenAI-compatible provider. The interface is identical regardless of backend — switching from Ollama to OpenAI is a two-line configuration change, and vice versa. Purpose-built Ollama gems have access to Ollama-native features (model management, Modelfile creation, streaming with the native format) but at the cost of a non-portable interface that requires changes if you ever switch backends. For typical Rails AI features — chat, summarisation, classification, embeddings — the OpenAI-compatible interface is fully sufficient.
Handling Long-Running Requests with Rack Timeout
Rails applications typically configure Rack::Timeout to kill requests that exceed a certain duration (commonly 15–30 seconds). LLM inference can easily exceed this limit for long responses. Configure Rack::Timeout to allow longer timeouts for AI endpoints specifically, or use the background job pattern (ActiveJob) to move inference out of the request-response cycle entirely:
# config/initializers/rack_timeout.rb
# Increase global timeout for AI-heavy applications
Rack::Timeout.timeout = 120 # 2 minutes
# Or use per-request timeout via middleware:
Rack::Timeout::StateManager.timeout_for_request do |env|
env['PATH_INFO'].start_with?('/ai') ? 300 : 15
end
Environment-Based Configuration
# config/initializers/ai.rb
AI_CLIENT = OpenAI::Client.new(
access_token: ENV.fetch('OLLAMA_API_KEY', 'ollama'),
uri_base: ENV.fetch('OLLAMA_HOST', 'http://localhost:11434') + '/v1',
request_timeout: ENV.fetch('OLLAMA_TIMEOUT', '120').to_i
)
# config/environments/production.rb — point at team server
# OLLAMA_HOST=http://ai-server.internal:11434
# config/environments/test.rb — stub in tests
# (no real Ollama calls in test environment)
Ruby AI Beyond Rails
The ruby-openai gem works in any Ruby context — Sinatra, Rack apps, plain Ruby scripts, and Jekyll plugins. For non-Rails Ruby projects, the same client initialisation and chat/embedding API calls work identically. A Ruby script for batch document processing is a common simple use case: read files, call ollama chat for each, write results — no framework needed, just the gem and a few dozen lines of Ruby. The simplicity of the HTTP API combined with Ruby's clean syntax makes this kind of one-off AI processing script one of the fastest things to write in the language.
Performance Considerations for Ruby
Ruby's GIL (Global Interpreter Lock) means MRI Ruby cannot run truly parallel threads, but this does not significantly affect Ollama integration because the bottleneck is always the inference step, not the Ruby code. For concurrent AI requests in Rails, use Puma with multiple workers (processes, not threads) and ensure each Puma worker has its own AI client instance. For high-throughput batch processing outside Rails, use the Parallel gem or plain Ruby processes (via Process.fork) to process documents in parallel, with each process making its own sequential Ollama calls. Jruby (which lacks the GIL) enables true thread-based parallelism for I/O-bound tasks, though the Ollama calls are I/O-bound and benefit from concurrent threading even in MRI Ruby via Puma's threaded mode.
The Ruby AI Ecosystem in 2026
Ruby's AI integration ecosystem has matured considerably. The ruby-openai gem is actively maintained and covers the full OpenAI API including assistants, fine-tuning, and moderation in addition to chat and embeddings. Community gems for LangChain-style orchestration in Ruby (langchainrb) provide chains, agents, and vector store integrations following patterns familiar from the Python ecosystem. For Rails teams building AI features, the combination of ruby-openai for model calls, ActiveJob for async processing, Rails.cache for response caching, and LangChainrb for complex orchestration covers the vast majority of AI application requirements without leaving the Ruby ecosystem.
Ollama's OpenAI-compatible API makes Ruby the language of least resistance for teams already invested in Rails. No new patterns to learn, no separate service to deploy, no context-switching to Python for AI features — just a new gem pointed at a local endpoint. For Rails shops that have been waiting for the right moment to add AI capabilities to their applications, the combination of Ollama's local inference and the mature Ruby AI gem ecosystem makes 2026 that moment.
Migrating from OpenAI to Ollama in Rails
For Rails applications already calling the OpenAI API via ruby-openai, migrating to Ollama is straightforward in most cases. Update the client initialisation to point at your Ollama endpoint, change the model names (from gpt-4o to llama3.2 or your preferred model), and test. Chat and embedding calls use identical method signatures — the response structure is the same OpenAI-compatible JSON. The areas that need attention: Ollama's model names do not map one-to-one with OpenAI model names, so prompts optimised for GPT-4o may need adjustment for a different model family; tool calling (function calling) support varies by model, so verify your chosen model supports it if you use tools; and image inputs require a vision-capable model like LLaVA or Gemma 3 rather than any model. Beyond these three areas, the migration is typically a configuration change, not a code change — which is the central practical value of Ollama's OpenAI-compatible API for Ruby teams.
Practical Starting Point
Add the gem, configure the client pointing at http://localhost:11434/v1, write the AiService class from this article, and call it from one controller action or background job. Evaluate the output quality on your actual use case with the model you have chosen — run 10–20 real examples through it and compare to your quality bar. If the quality is acceptable, expand the integration; if not, try a larger or different model before spending more time on the integration. The cost of evaluating local Ollama for your Rails application is under an hour, and the potential savings in API costs and privacy improvements make that hour one of the best-spent in any AI-integration project.
A Note on Model Quality in Ruby Applications
The model you choose matters more than the integration library. A well-prompted llama3.2 7B handles most common Rails AI tasks — document classification, summarisation, simple Q&A — with quality that is good enough for production use in internal tools and non-critical customer features. For customer-facing features where quality is paramount, evaluate larger models (13B, 32B) or cloud models alongside the local option before committing to a production architecture. The ruby-openai gem makes this evaluation easy: switching between Ollama's llama3.2, a locally-hosted Qwen 32B, and OpenAI's GPT-4o requires only changing the model name and uri_base — your service object code is identical across all three. Run the same 20–30 representative examples through each option and let your quality requirements drive the choice rather than defaulting to either local or cloud — the evaluation cost is an hour, and the data you get from running real examples is far more informative than any benchmark or comparison article for your specific use case and your production requirements far better than any general comparison can — and the investment in evaluation is almost always returned within the first week of production usage through either cost savings, privacy improvements, or both — a straightforward return on a minimal upfront investment that every Ruby team working with AI should make before committing to any architecture decision about their AI backend.
The model-agnostic interface that ruby-openai provides is ultimately what makes local AI integration in Ruby so low-friction — you evaluate on Ollama, and if you ever need to switch to a cloud model for quality reasons, the gem handles it without touching your application code.
That shift from prototype to production without touching application code is the most practical advantage the library abstraction provides — and it compounds every time you add a new feature or scale a new model to a new use case.