How to Use Ollama with Next.js

Introduction

Next.js is the most widely used React framework for production web applications, offering server-side rendering, API routes, and a file-based routing system that makes it straightforward to build both the frontend and backend of an application in a single codebase. When you combine Next.js with Ollama, you can build AI-powered web applications that run entirely on your own infrastructure — no OpenAI API key required, no per-token costs, and no data leaving your machine.

In this guide you will learn how to set up Ollama, create a Next.js API route that proxies requests to Ollama, build a streaming chat UI with React, and structure your project for easy extension. Everything runs locally, so you can iterate quickly without worrying about rate limits or latency from external services.

Prerequisites

Before you start, make sure you have the following ready:

Node.js 18+ — Next.js 14 requires Node 18 or later. Check with node -v.
Ollama — Download from ollama.com and pull a model: ollama pull llama3.2.
A terminal — You will use it to scaffold the project and run the dev server.

Confirm Ollama is running by visiting http://localhost:11434 in your browser. You should see a plain text response. If not, start Ollama with ollama serve.

Creating the Next.js Project

Scaffold a new Next.js app using the official CLI:

npx create-next-app@latest ollama-nextjs
cd ollama-nextjs

When prompted, select the App Router (the default in Next.js 14+), enable TypeScript if you prefer it, and accept the defaults for the rest. Once the project is created, start the dev server:

npm run dev

Open http://localhost:3000 to confirm the app is running. The project structure uses the app/ directory for pages and layouts, and the app/api/ directory for API route handlers.

Creating the Ollama API Route

Next.js API routes are the right place to put your Ollama logic. They run on the server, so the Ollama URL never leaks to the client, and you can add authentication or rate limiting later without touching the frontend. Create a file at app/api/chat/route.ts:

import { NextRequest } from 'next/server'

export async function POST(req: NextRequest) {
  const body = await req.json()

  const ollamaRes = await fetch('http://localhost:11434/api/chat', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ ...body, stream: true }),
  })

  if (!ollamaRes.ok) {
    return new Response(JSON.stringify({ error: 'Ollama error' }), { status: 500 })
  }

  return new Response(ollamaRes.body, {
    headers: {
      'Content-Type': 'text/event-stream',
      'Cache-Control': 'no-cache',
    },
  })
}

This route handler receives a POST request from the browser, forwards it to Ollama with streaming enabled, and pipes the streamed response body straight back to the client. Next.js 14’s App Router supports streaming responses natively, so no extra configuration is needed.

Building the Chat UI

Now create the main chat page at app/page.tsx. This React component manages conversation state and streams responses from the API route:

'use client'

import { useState } from 'react'

type Message = { role: 'user' | 'assistant'; content: string }

export default function Home() {
  const [model, setModel] = useState('llama3.2')
  const [prompt, setPrompt] = useState('')
  const [messages, setMessages] = useState<Message[]>([])
  const [loading, setLoading] = useState(false)

  async function send() {
    if (!prompt.trim()) return
    const userMsg: Message = { role: 'user', content: prompt.trim() }
    const updated = [...messages, userMsg]
    setMessages(updated)
    setPrompt('')
    setLoading(true)

    const assistantMsg: Message = { role: 'assistant', content: '' }
    setMessages([...updated, assistantMsg])

    const res = await fetch('/api/chat', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ model, messages: updated }),
    })

    const reader = res.body!.getReader()
    const decoder = new TextDecoder()
    let accumulated = ''

    while (true) {
      const { done, value } = await reader.read()
      if (done) break
      const lines = decoder.decode(value).split('\n').filter(Boolean)
      for (const line of lines) {
        try {
          const data = JSON.parse(line)
          if (data.message?.content) {
            accumulated += data.message.content
            setMessages(prev => {
              const copy = [...prev]
              copy[copy.length - 1] = { role: 'assistant', content: accumulated }
              return copy
            })
          }
        } catch {}
      }
    }
    setLoading(false)
  }

  return (
    <main style={{ maxWidth: 720, margin: '2rem auto', fontFamily: 'sans-serif', padding: '0 1rem' }}>
      <h1>Ollama + Next.js</h1>
      <select value={model} onChange={e => setModel(e.target.value)} style={{ marginBottom: '1rem' }}>
        <option value="llama3.2">llama3.2</option>
        <option value="mistral">mistral</option>
        <option value="gemma3">gemma3</option>
      </select>
      <div style={{ border: '1px solid #ddd', borderRadius: 8, padding: '1rem', minHeight: 300, maxHeight: 480, overflowY: 'auto', background: '#fafafa', marginBottom: '1rem' }}>
        {messages.map((m, i) => (
          <div key={i} style={{ marginBottom: '0.75rem', color: m.role === 'user' ? '#1a56db' : '#111' }}>
            <strong>{m.role}:</strong> {m.content}
          </div>
        ))}
      </div>
      <div style={{ display: 'flex', gap: '0.5rem' }}>
        <input
          value={prompt}
          onChange={e => setPrompt(e.target.value)}
          onKeyDown={e => e.key === 'Enter' && send()}
          placeholder="Type a message..."
          style={{ flex: 1, padding: '0.5rem', border: '1px solid #ccc', borderRadius: 4 }}
        />
        <button onClick={send} disabled={loading} style={{ padding: '0.5rem 1rem', background: loading ? '#aaa' : '#0070f3', color: 'white', border: 'none', borderRadius: 4, cursor: loading ? 'default' : 'pointer' }}>
          {loading ? 'Thinking...' : 'Send'}
        </button>
      </div>
    </main>
  )
}

The key pattern here is using a functional update inside setMessages to append tokens as they arrive. Because the streaming loop runs asynchronously and React batches state updates, using prev => [...] ensures you always update from the latest state rather than a stale closure capture.

Loading Models Dynamically

To populate the model selector from Ollama’s actual installed models rather than a hardcoded list, add another API route at app/api/models/route.ts:

export async function GET() {
  try {
    const res = await fetch('http://localhost:11434/api/tags')
    const data = await res.json()
    return Response.json(data.models.map((m: { name: string }) => m.name))
  } catch {
    return Response.json(['llama3.2'])
  }
}

Then in your page component, fetch the model list on mount using a useEffect:

const [models, setModels] = useState(['llama3.2'])

useEffect(() => {
  fetch('/api/models').then(r => r.json()).then(setModels)
}, [])

Replace the hardcoded <option> elements with a models.map() call and you have a fully dynamic model picker that always reflects what is installed on your machine.

Using the Vercel AI SDK for Simpler Streaming

If you prefer a higher-level abstraction for streaming, the Vercel AI SDK integrates seamlessly with Next.js and supports Ollama as a provider. Install the packages:

npm install ai @ai-sdk/openai

Ollama is OpenAI-API-compatible, so you can use the OpenAI provider pointed at your local Ollama instance:

import { createOpenAI } from '@ai-sdk/openai'
import { streamText } from 'ai'

const ollama = createOpenAI({
  baseURL: 'http://localhost:11434/v1',
  apiKey: 'ollama',
})

export async function POST(req: Request) {
  const { messages } = await req.json()
  const result = streamText({
    model: ollama('llama3.2'),
    messages,
  })
  return result.toDataStreamResponse()
}

On the client side, use the useChat hook from the AI SDK, which handles all the streaming state management for you:

'use client'
import { useChat } from 'ai/react'

export default function Chat() {
  const { messages, input, handleInputChange, handleSubmit, isLoading } = useChat()
  return (
    <div>
      {messages.map(m => (
        <div key={m.id}><strong>{m.role}:</strong> {m.content}</div>
      ))}
      <form onSubmit={handleSubmit}>
        <input value={input} onChange={handleInputChange} placeholder="Say something..." />
        <button type="submit" disabled={isLoading}>Send</button>
      </form>
    </div>
  )
}

The AI SDK approach reduces the streaming boilerplate significantly, but requires Ollama to expose its OpenAI-compatible endpoint, which it does at /v1 by default. Both approaches work well — the manual fetch approach gives you more control, while the AI SDK gives you a cleaner component API and easy access to features like tool calling and structured outputs.

Adding a System Prompt

System prompts let you customise the model’s behaviour without changing the user-facing interface. In the manual fetch approach, prepend a system message to the messages array before sending to the API route:

body: JSON.stringify({
  model,
  messages: [
    { role: 'system', content: 'You are a concise technical assistant. Answer in plain language.' },
    ...updated
  ]
})

In the AI SDK approach, pass an initialMessages prop or use the system option in streamText:

const result = streamText({
  model: ollama('llama3.2'),
  system: 'You are a concise technical assistant.',
  messages,
})

Either way, the system prompt is invisible to the user but shapes every response the model produces. This is the foundation for building specialised tools — a code reviewer, a writing assistant, or a domain-specific Q&A bot — on top of a general-purpose model.

Deploying for Local Production Use

When you are done developing, build the Next.js app for production:

npm run build
npm start

The app will run on port 3000 by default. Since all Ollama calls are server-side in the API routes, the Ollama instance only needs to be reachable from the machine running the Next.js server. If you want to expose the chat UI to other devices on your local network, run Next.js with HOST=0.0.0.0 npm start and access it via your machine’s local IP address. The Ollama API itself can remain bound to localhost.

Conclusion

Next.js is an excellent host for Ollama-powered applications. Its API route system gives you a clean server-side proxy for Ollama, the App Router supports streaming responses natively, and the React component model makes it straightforward to build a responsive chat UI. Whether you use the manual fetch approach for full control or the Vercel AI SDK for a higher-level abstraction, the core pattern is the same: keep Ollama server-side, stream tokens to the client, and update React state incrementally as they arrive. From here you can add persistence with a database, file upload for RAG, or authentication to restrict access — all within the familiar Next.js project structure.

Handling Errors and Timeouts

Local LLMs can be slow on modest hardware, and long-running generations occasionally stall. It is worth adding a timeout to your API route so the server does not hang indefinitely:

const ollamaRes = await fetch('http://localhost:11434/api/chat', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ ...body, stream: true }),
  signal: AbortSignal.timeout(120_000),
})

On the client, wrap the fetch call in a try-catch and display a friendly error when the request fails or times out:

try {
  const res = await fetch('/api/chat', { ... })
  if (!res.ok) throw new Error('Request failed')
  // ... stream handling
} catch (err) {
  setMessages(prev => {
    const copy = [...prev]
    copy[copy.length - 1] = { role: 'assistant', content: 'Something went wrong. Is Ollama running?' }
    return copy
  })
} finally {
  setLoading(false)
}

These small additions make the difference between a toy prototype and a tool that is genuinely pleasant to use day to day. Users get clear feedback when something goes wrong rather than a spinning button that never resolves, and the server stays responsive even when a model takes longer than expected to load or generate.