How to Use Ollama with JavaScript and Node.js

The official Ollama npm package provides a clean TypeScript-first API for calling local models from Node.js, Deno, and Bun. This guide covers installation, the core API methods, streaming, embeddings, model management, and building a simple chatbot — everything you need to integrate Ollama into a JavaScript project.

Installation

npm install ollama
# or
yarn add ollama
# or
bun add ollama

Basic Usage

import ollama from 'ollama';

// Simple generate
const response = await ollama.generate({
  model: 'llama3.2',
  prompt: 'Why is Node.js single-threaded?',
  stream: false
});
console.log(response.response);

// Chat
const chat = await ollama.chat({
  model: 'llama3.2',
  messages: [{ role: 'user', content: 'Hello!' }]
});
console.log(chat.message.content);

Streaming Responses

import ollama from 'ollama';

// Stream to console
const stream = await ollama.chat({
  model: 'llama3.2',
  messages: [{ role: 'user', content: 'Tell me about async/await in JavaScript.' }],
  stream: true
});

for await (const chunk of stream) {
  process.stdout.write(chunk.message.content);
}
console.log(); // newline

Multi-Turn Conversation

import ollama from 'ollama';
import * as readline from 'readline/promises';

const rl = readline.createInterface({ input: process.stdin, output: process.stdout });
const messages = [];

while (true) {
  const input = await rl.question('You: ');
  if (input.toLowerCase() === 'exit') break;

  messages.push({ role: 'user', content: input });

  const stream = await ollama.chat({ model: 'llama3.2', messages, stream: true });
  process.stdout.write('Assistant: ');
  let reply = '';
  for await (const chunk of stream) {
    process.stdout.write(chunk.message.content);
    reply += chunk.message.content;
  }
  console.log();
  messages.push({ role: 'assistant', content: reply });
}
rl.close();

Embeddings

import ollama from 'ollama';

const result = await ollama.embeddings({
  model: 'nomic-embed-text',
  prompt: 'The quick brown fox jumps over the lazy dog'
});
console.log(`Embedding dimensions: ${result.embedding.length}`);
console.log(`First 5 values: ${result.embedding.slice(0, 5)}`);

// Cosine similarity helper
function cosineSimilarity(a, b) {
  const dot = a.reduce((sum, ai, i) => sum + ai * b[i], 0);
  const magA = Math.sqrt(a.reduce((s, v) => s + v * v, 0));
  const magB = Math.sqrt(b.reduce((s, v) => s + v * v, 0));
  return dot / (magA * magB);
}

const [e1, e2] = await Promise.all([
  ollama.embeddings({ model: 'nomic-embed-text', prompt: 'dog' }),
  ollama.embeddings({ model: 'nomic-embed-text', prompt: 'puppy' })
]);
console.log('dog/puppy similarity:', cosineSimilarity(e1.embedding, e2.embedding).toFixed(3));

Model Management

import ollama from 'ollama';

// List available models
const { models } = await ollama.list();
models.forEach(m => console.log(`${m.name.padEnd(40)} ${(m.size/1e9).toFixed(1)}GB`));

// Pull a model with progress
const stream = await ollama.pull({ model: 'llama3.2', stream: true });
for await (const progress of stream) {
  if (progress.total) {
    const pct = Math.round(progress.completed / progress.total * 100);
    process.stdout.write(`\rDownloading: ${pct}%`);
  }
}
console.log('\nDone!');

// Delete a model
await ollama.delete({ model: 'llama3.2' });

// Show running models
const { models: running } = await ollama.ps();
console.log('Running:', running.map(m => m.name));

Custom Client (Remote Ollama)

import { Ollama } from 'ollama';

// Connect to a remote Ollama server
const client = new Ollama({ host: 'http://192.168.1.100:11434' });

const response = await client.chat({
  model: 'llama3.2',
  messages: [{ role: 'user', content: 'Hello from remote!' }]
});
console.log(response.message.content);

Structured Output with Zod

import ollama from 'ollama';
import { z } from 'zod';

const ProductSchema = z.object({
  name: z.string(),
  price: z.number(),
  inStock: z.boolean(),
  tags: z.array(z.string())
});

const response = await ollama.chat({
  model: 'llama3.2',
  messages: [{ role: 'user', content: 'Extract: Blue widget, $29.99, available, tags: sale, new' }],
  format: ProductSchema,  // Pass Zod schema directly
  options: { temperature: 0 }
});

const product = ProductSchema.parse(JSON.parse(response.message.content));
console.log(product.name, product.price, product.inStock);

Vision / Image Input

import ollama from 'ollama';
import fs from 'fs';

const imageData = fs.readFileSync('photo.jpg');
const base64 = imageData.toString('base64');

const response = await ollama.chat({
  model: 'llava',
  messages: [{
    role: 'user',
    content: 'What is in this image?',
    images: [base64]
  }]
});
console.log(response.message.content);

Why Use the npm Package Instead of the REST API Directly?

The Ollama npm package wraps the REST API with TypeScript types, automatic streaming iteration, and a consistent async/await interface — saving the boilerplate of parsing newline-delimited JSON streams manually. The for await...of loop over streaming responses is significantly cleaner than manually reading from a ReadableStream with a getReader() loop. TypeScript users get full type inference for all request options and response fields without any additional type definitions. The package is maintained by the Ollama team and stays in sync with new API features as they are added. For most JavaScript projects, the package is the right abstraction — use the raw fetch API only when you have a specific reason (a constrained runtime, no package manager, debugging).

Error Handling

import ollama from 'ollama';

async function safeChat(prompt) {
  try {
    const response = await ollama.chat({
      model: 'llama3.2',
      messages: [{ role: 'user', content: prompt }]
    });
    return response.message.content;
  } catch (err) {
    if (err.cause?.code === 'ECONNREFUSED') {
      throw new Error('Ollama not running. Start with: ollama serve');
    }
    if (err.message?.includes('model')) {
      throw new Error(`Model not found. Pull with: ollama pull llama3.2`);
    }
    throw err;
  }
}

Using with Express

import express from 'express';
import ollama from 'ollama';

const app = express();
app.use(express.json());

app.post('/chat', async (req, res) => {
  const { message, model = 'llama3.2' } = req.body;
  res.setHeader('Content-Type', 'text/plain; charset=utf-8');
  res.setHeader('Transfer-Encoding', 'chunked');

  const stream = await ollama.chat({
    model,
    messages: [{ role: 'user', content: message }],
    stream: true
  });

  for await (const chunk of stream) {
    res.write(chunk.message.content);
  }
  res.end();
});

app.listen(3000, () => console.log('Server on :3000'));

Building a Simple RAG Pipeline

import ollama from 'ollama';

// Simple in-memory vector store
const documents = [
  'Node.js uses an event-driven, non-blocking I/O model.',
  'npm is the package manager for Node.js projects.',
  'Express is a minimal web framework for Node.js.',
];

async function buildIndex(docs) {
  return Promise.all(docs.map(async (doc, i) => ({
    id: i, text: doc,
    embedding: (await ollama.embeddings({ model: 'nomic-embed-text', prompt: doc })).embedding
  })));
}

function cosineSim(a, b) {
  const dot = a.reduce((s, v, i) => s + v * b[i], 0);
  return dot / (Math.sqrt(a.reduce((s, v) => s + v * v, 0)) * Math.sqrt(b.reduce((s, v) => s + v * v, 0)));
}

async function query(index, question) {
  const qEmbed = (await ollama.embeddings({ model: 'nomic-embed-text', prompt: question })).embedding;
  const ranked = index.map(d => ({ ...d, score: cosineSim(d.embedding, qEmbed) }))
    .sort((a, b) => b.score - a.score);
  const context = ranked.slice(0, 2).map(d => d.text).join('\n');
  const r = await ollama.chat({
    model: 'llama3.2',
    messages: [{ role: 'user', content: `Answer based on:\n${context}\n\nQuestion: ${question}` }]
  });
  return r.message.content;
}

const index = await buildIndex(documents);
console.log(await query(index, 'What does npm do?'));

TypeScript Configuration

// tsconfig.json — recommended settings for Ollama projects
{
  "compilerOptions": {
    "target": "ES2022",
    "module": "ESNext",
    "moduleResolution": "bundler",
    "strict": true,
    "esModuleInterop": true
  }
}

// Full TypeScript example with types
import ollama, { Message, ChatResponse } from 'ollama';

interface ConversationTurn {
  role: 'user' | 'assistant';
  content: string;
}

async function conversation(turns: ConversationTurn[]): Promise {
  const messages: Message[] = turns.map(t => ({ role: t.role, content: t.content }));
  return ollama.chat({ model: 'llama3.2', messages });
}

Performance Considerations

For Node.js applications serving multiple users, the most important performance consideration is request queuing. Ollama processes one request at a time per model by default. If two users send chat requests simultaneously, the second waits for the first to complete. For applications with concurrent users, implement a queue in your application layer (using a simple Promise queue or a library like p-queue) that limits concurrent Ollama requests to the number configured in OLLAMA_NUM_PARALLEL on your server. Display a “queued” indicator to users when a request is waiting rather than letting them wonder if the application is hanging. The Ollama npm package itself handles individual request lifecycle cleanly — the concurrency management is your application’s responsibility.

Getting Started

The minimum viable Node.js Ollama integration is four lines: import the package, call ollama.chat with a model and message, and print the response. From there, add streaming for better UX, conversation history for context, and embeddings for retrieval-augmented generation as your application needs them. The npm package’s API surface is small and well-documented — the TypeScript types serve as inline documentation for every option and response field. Install the package, ensure Ollama is running locally, and you can have a working integration in minutes.

Why Use the npm Package Instead of the REST API?

The Ollama npm package wraps the REST API with TypeScript types, automatic streaming iteration, and a consistent async/await interface — saving the boilerplate of parsing newline-delimited JSON streams manually. The for await...of loop over streaming responses is significantly cleaner than manually reading from a ReadableStream with getReader(). TypeScript users get full type inference for all request options and response fields without any additional type definitions. The package is maintained by the Ollama team and stays in sync with new API features. For most JavaScript projects, the package is the right abstraction — use the raw fetch API only when you have a specific reason such as a constrained runtime, no package manager, or active debugging of the HTTP layer.

Error Handling in Practice

import ollama from 'ollama';

async function safeChat(prompt) {
  try {
    const response = await ollama.chat({
      model: 'llama3.2',
      messages: [{ role: 'user', content: prompt }]
    });
    return response.message.content;
  } catch (err) {
    if (err.cause?.code === 'ECONNREFUSED') {
      throw new Error('Ollama not running — start with: ollama serve');
    }
    if (err.message?.includes('model')) {
      throw new Error('Model not found — pull with: ollama pull llama3.2');
    }
    throw err;
  }
}

console.log(await safeChat('What is Node.js?'));

Express Integration

import express from 'express';
import ollama from 'ollama';

const app = express();
app.use(express.json());

app.post('/chat', async (req, res) => {
  const { message, model = 'llama3.2' } = req.body;
  res.setHeader('Content-Type', 'text/plain; charset=utf-8');
  const stream = await ollama.chat({
    model, messages: [{ role: 'user', content: message }], stream: true
  });
  for await (const chunk of stream) res.write(chunk.message.content);
  res.end();
});

app.listen(3000, () => console.log('Server on :3000'));

Performance and Concurrency

Ollama processes one request at a time per model by default. For applications serving multiple concurrent users, implement a queue at the application layer using a library like p-queue to limit concurrent Ollama requests to the number set in OLLAMA_NUM_PARALLEL. Display a waiting indicator to users while their request is queued rather than letting them see a hanging interface. The npm package handles individual request lifecycle correctly — concurrency management is your application’s responsibility. For single-user tools and scripts, concurrency is rarely a concern and the default behaviour is fine.

Getting Started

Install the package with npm install ollama, ensure Ollama is running locally with a model pulled, and use the chat or generate method for your first request. The TypeScript types provide excellent IDE autocomplete for all options and response fields — the fastest way to explore the full API surface is to start typing in a TypeScript file and let your IDE’s autocomplete guide you through the available options. From a working basic integration, the streaming, embeddings, and model management APIs follow the same consistent pattern and are straightforward to add incrementally.

Building a Simple RAG Pipeline in Node.js

The combination of Ollama embeddings and a local vector search gives you a basic retrieval-augmented generation pipeline without any external services. The implementation below uses an in-memory store — for production use, replace it with a persistent vector database like Chroma or pgvector:

import ollama from 'ollama';

const docs = [
  'Node.js uses an event-driven, non-blocking I/O model.',
  'npm is the default package manager for Node.js.',
  'Express is a minimal and flexible Node.js web framework.',
];

async function buildIndex(documents) {
  return Promise.all(documents.map(async (text, id) => ({
    id, text,
    embedding: (await ollama.embeddings({ model: 'nomic-embed-text', prompt: text })).embedding
  })));
}

function cosine(a, b) {
  const dot = a.reduce((s, v, i) => s + v * b[i], 0);
  return dot / Math.sqrt(a.reduce((s,v)=>s+v*v,0)) / Math.sqrt(b.reduce((s,v)=>s+v*v,0));
}

async function ask(index, question) {
  const qe = (await ollama.embeddings({ model: 'nomic-embed-text', prompt: question })).embedding;
  const ctx = index.sort((a,b)=>cosine(b.embedding,qe)-cosine(a.embedding,qe))
    .slice(0,2).map(d=>d.text).join('\n');
  const r = await ollama.chat({
    model: 'llama3.2',
    messages: [{ role:'user', content:`Answer based on:\n${ctx}\n\nQuestion: ${question}` }]
  });
  return r.message.content;
}

const idx = await buildIndex(docs);
console.log(await ask(idx, 'What does npm do?'));

Structured Output with Zod

import ollama from 'ollama';
import { z } from 'zod';

const Product = z.object({
  name: z.string(),
  price: z.number(),
  inStock: z.boolean(),
  tags: z.array(z.string())
});

const r = await ollama.chat({
  model: 'llama3.2',
  messages: [{ role:'user', content:'Extract: Blue widget, $29.99, available, tags: sale, new' }],
  format: Product,
  options: { temperature: 0 }
});
const product = Product.parse(JSON.parse(r.message.content));
console.log(product.name, product.price);

When to Use the Ollama npm Package

The npm package is the right choice for any JavaScript or TypeScript project targeting Node.js, Deno, or Bun. It handles streaming iteration, TypeScript types, and the model management API in a clean interface that would otherwise require 50–100 lines of manual implementation. The main reason to skip it and use the raw REST API instead is a runtime constraint — a Lambda function or edge worker where adding npm dependencies is restricted, a shell script without a Node.js runtime, or a language without an equivalent SDK. For all standard Node.js application development, the package is the recommended integration path and the fastest way to get from zero to a working Ollama integration in a JavaScript project.

TypeScript in Node.js Ollama Projects

The ollama package ships with TypeScript declarations — all methods, parameters, and response types are fully typed. In a TypeScript project, you get autocomplete for every option in the chat, generate, embeddings, and model management calls, and type errors if you pass incorrect parameter types or try to access non-existent response fields. This makes the integration significantly more maintainable than working with untyped REST API calls, particularly on larger projects where the Ollama integration is one part of a larger codebase. The types are accurate and kept in sync with the package API, so you can trust the type signatures to reflect what the library actually accepts and returns.

For new projects, the recommended setup is TypeScript with ESM modules — "type": "module" in package.json and "module": "ESNext" in tsconfig.json. This matches the package’s native module format and avoids the CommonJS/ESM interop issues that can arise when mixing module systems. The top-level await syntax used in most Ollama examples works natively in ESM Node.js without any additional configuration.

Using Ollama in Serverless and Edge Environments

Ollama runs locally, which creates an inherent tension with serverless functions (AWS Lambda, Vercel Functions) that execute in cloud environments without access to your local machine. The practical solution is to run Ollama on a machine your serverless function can reach — a VPS, a home server with port forwarding, or a cloud VM — and point the custom Ollama client at that host. The npm package’s custom client accepts any HTTP URL, so new Ollama({ host: 'http://your-server.example.com:11434' }) works from any environment that can make HTTP requests. This pattern makes Ollama available as a shared team inference server without requiring every developer to run their own local instance, while still keeping the model and data on infrastructure you control rather than a third-party API.

The npm Package as the Foundation for Larger Projects

The Ollama npm package is the foundation that higher-level JavaScript AI frameworks build on. LangChain.js supports Ollama as a language model backend through its ChatOllama class. Vercel’s AI SDK supports Ollama via the createOpenAI adapter pointed at the local endpoint. Both use the same underlying HTTP API that the npm package wraps, giving you the option to use the high-level framework abstractions for complex chains and agents or drop to the package directly when you need lower-level control. Understanding the npm package’s API makes it much easier to debug issues in framework-based integrations, since you can always reproduce a problem with a direct package call and isolate whether the issue is in the framework layer or the Ollama integration itself.

The JavaScript AI Ecosystem in 2026

The Ollama npm package is part of a broader shift toward JavaScript-native AI development. LangChain.js, the Vercel AI SDK, and other JavaScript AI frameworks have matured significantly, and all support Ollama as a local backend. The result is that JavaScript developers can now build sophisticated AI applications — RAG pipelines, agents, multi-step chains, structured extraction — using the same language and tooling they already know, backed by a local model rather than a cloud API. The Ollama npm package is the foundation of this ecosystem: a clean, stable, well-typed interface to local inference that any JavaScript framework or application can build on. Its consistent API across Node.js, Deno, and Bun makes it a genuinely portable building block for local AI in the JavaScript world.

Next Steps

After a working basic integration, the natural progression is: add streaming for any user-facing feature where response time is visible, add conversation history management for multi-turn use cases, add embeddings and semantic search for document retrieval, and explore the model management API for applications that need to pull or switch models programmatically. The Ollama npm package’s API is small enough to learn completely in an afternoon, and the TypeScript types make the full surface area discoverable directly in your IDE. Each piece builds on the same consistent pattern — every new feature you add will feel familiar rather than requiring a new mental model, which is ultimately what makes the package a pleasure to work with for local AI integration in JavaScript projects — the package handles the low-level details so you can focus on building the application logic that actually matters for your use case.

The shift toward JavaScript-native AI tooling in 2026 reflects a broader recognition that the largest developer community in the world should not need to learn Python to build AI-powered applications. The Ollama npm package is a key part of this shift — a well-maintained, TypeScript-first library that makes local AI as accessible in JavaScript as any other npm dependency, with none of the virtual environment setup, dependency conflicts, or language boundary overhead that characterised AI development in JavaScript just two years ago — and that accessibility is what will drive broader adoption of local AI throughout the JavaScript ecosystem over the coming years — lowering the barrier until local AI integration becomes as natural as adding any other npm dependency to a project — the future of AI development in JavaScript is local-first, and the Ollama npm package is already that future, available today, and every project you build with it today becomes easier to scale and maintain as the ecosystem matures around it.

Installation

Basic Usage

Streaming Responses

Multi-Turn Conversation

Embeddings

Model Management

Custom Client (Remote Ollama)

Structured Output with Zod

Vision / Image Input

Why Use the npm Package Instead of the REST API Directly?

Error Handling

Using with Express

Building a Simple RAG Pipeline

TypeScript Configuration

Performance Considerations

Getting Started

Why Use the npm Package Instead of the REST API?

Error Handling in Practice

Express Integration

Performance and Concurrency

Getting Started

Building a Simple RAG Pipeline in Node.js

Structured Output with Zod

When to Use the Ollama npm Package

TypeScript in Node.js Ollama Projects

Using Ollama in Serverless and Edge Environments

The npm Package as the Foundation for Larger Projects

The JavaScript AI Ecosystem in 2026

Next Steps

Leave a Comment Cancel reply