How to Use Ollama with SvelteKit

Introduction

SvelteKit is a full-stack framework built on Svelte that provides file-based routing, server-side rendering, and a clean server/client split — all with Svelte’s famously minimal boilerplate. When paired with Ollama, SvelteKit becomes an excellent foundation for building local AI applications. You get the performance benefits of Svelte’s compile-time reactivity on the frontend, combined with SvelteKit’s server routes to safely proxy requests to your local Ollama instance.

This guide walks you through scaffolding a SvelteKit project, connecting it to Ollama, streaming LLM responses into a Svelte component, and structuring the app so it is easy to extend. By the end you will have a working chat interface that talks to a locally-running model with no external API dependencies.

Prerequisites

You will need the following before starting:

Node.js 18+ — SvelteKit requires a modern Node version. Run node -v to check.
Ollama — Install from ollama.com and pull a model: ollama pull llama3.2.
npm or pnpm — Either package manager works fine for this guide.

Make sure Ollama is running. You can start it with ollama serve if it is not running as a background service. The Ollama API should be available at http://localhost:11434.

Scaffolding the SvelteKit Project

Use the official SvelteKit scaffolding command to create a new project:

npm create svelte@latest ollama-sveltekit
cd ollama-sveltekit
npm install

When prompted, select the “Skeleton project” template and choose TypeScript if you prefer it — the examples below use plain JavaScript but are easy to adapt. Once dependencies are installed, start the dev server with npm run dev and open http://localhost:5173 in your browser.

Creating a Server Route to Proxy Ollama

The cleanest architecture for a SvelteKit + Ollama app is to proxy all Ollama requests through a SvelteKit server route. This keeps your Ollama URL server-side and lets you add auth or logging later without touching the frontend. Create a file at src/routes/api/chat/+server.js:

export async function POST({ request }) {
  const body = await request.json()
  const response = await fetch('http://localhost:11434/api/chat', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify(body)
  })
  return new Response(response.body, {
    headers: {
      'Content-Type': 'application/x-ndjson',
      'Transfer-Encoding': 'chunked'
    }
  })
}

This server route accepts POST requests from your Svelte components and forwards them to Ollama, streaming the response body directly back to the client. SvelteKit handles the HTTP plumbing, so you do not need to worry about buffering or connection management.

Building the Chat Component

Create the main chat page at src/routes/+page.svelte. This component handles user input, sends messages to the server route, and streams the response into the UI:

<script>
  let model = 'llama3.2'
  let prompt = ''
  let messages = []
  let loading = false

  async function send() {
    if (!prompt.trim()) return
    const userText = prompt.trim()
    messages = [...messages, { role: 'user', content: userText }]
    prompt = ''
    loading = true
    const assistantMsg = { role: 'assistant', content: '' }
    messages = [...messages, assistantMsg]

    const res = await fetch('/api/chat', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ model, messages: messages.filter(m => m.content), stream: true })
    })

    const reader = res.body.getReader()
    const decoder = new TextDecoder()
    while (true) {
      const { done, value } = await reader.read()
      if (done) break
      const lines = decoder.decode(value).split('\n').filter(Boolean)
      for (const line of lines) {
        const data = JSON.parse(line)
        if (data.message?.content) {
          assistantMsg.content += data.message.content
          messages = [...messages]
        }
      }
    }
    loading = false
  }
</script>

<div class="chat">
  <h1>Ollama Chat</h1>
  <select bind:value={model}>
    <option value="llama3.2">llama3.2</option>
    <option value="mistral">mistral</option>
    <option value="gemma3">gemma3</option>
  </select>
  <div class="messages">
    {#each messages as msg}
      <div class={msg.role}><strong>{msg.role}:</strong> {msg.content}</div>
    {/each}
  </div>
  <div class="input-row">
    <input bind:value={prompt} on:keyup={e => e.key === 'Enter' && send()} placeholder="Type a message..." />
    <button on:click={send} disabled={loading}>{loading ? 'Thinking...' : 'Send'}</button>
  </div>
</div>

<style>
  .chat { max-width: 700px; margin: 2rem auto; font-family: sans-serif; }
  .messages { border: 1px solid #ddd; border-radius: 8px; padding: 1rem; min-height: 300px; max-height: 480px; overflow-y: auto; margin: 1rem 0; background: #fafafa; }
  .user { color: #1a56db; margin-bottom: 0.75rem; }
  .assistant { color: #111; margin-bottom: 0.75rem; }
  .input-row { display: flex; gap: 0.5rem; }
  input { flex: 1; padding: 0.5rem; border: 1px solid #ccc; border-radius: 4px; }
  button { padding: 0.5rem 1rem; background: #ff3e00; color: white; border: none; border-radius: 4px; cursor: pointer; }
  button:disabled { background: #aaa; }
</style>

Svelte’s reactivity model makes streaming particularly clean. When you update messages = [...messages] inside the streaming loop, Svelte automatically re-renders the relevant part of the DOM without any virtual DOM diffing overhead. The result is a smooth, low-latency typewriter effect as tokens arrive from the model.

Loading Available Models Dynamically

Rather than hardcoding a model list, you can load the available models from Ollama at page load time using a SvelteKit load function. Add a +page.server.js file alongside your page:

export async function load() {
  try {
    const res = await fetch('http://localhost:11434/api/tags')
    const data = await res.json()
    return { models: data.models.map(m => m.name) }
  } catch {
    return { models: ['llama3.2'] }
  }
}

Then in +page.svelte, accept the data prop and use it for the model selector:

<script>
  export let data
  let model = data.models[0] || 'llama3.2'
</script>

<select bind:value={model}>
  {#each data.models as m}
    <option value={m}>{m}</option>
  {/each}
</select>

This approach runs the model list fetch on the server during SSR, so the dropdown is populated on the initial page load with no client-side flash.

Adding a System Prompt

Many useful applications of LLMs rely on a system prompt to set the persona, tone, or task for the model. In the Ollama chat API, you can include a system message as the first entry in the messages array. Here is how to add a configurable system prompt to the SvelteKit app:

<script>
  let systemPrompt = 'You are a helpful assistant.'
  let messages = []

  async function send() {
    const fullMessages = [
      { role: 'system', content: systemPrompt },
      ...messages.filter(m => m.content)
    ]
    const res = await fetch('/api/chat', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ model, messages: fullMessages, stream: true })
    })
    // ... stream handling as before
  }
</script>

<textarea bind:value={systemPrompt} rows="3" placeholder="System prompt..." />

By prepending the system message to every request, you ensure the model always has its instructions regardless of how long the conversation has become. This is a simple but effective pattern for building focused tools — a coding assistant, a document summariser, or a customer service bot — on top of a general-purpose model.

Handling Errors and Edge Cases

A production-quality chat interface needs to handle several common failure scenarios gracefully. The most frequent issue is Ollama not being available — either because it has not been started or because it crashed. Wrap your server route in a try-catch and return a meaningful error response:

export async function POST({ request }) {
  try {
    const body = await request.json()
    const response = await fetch('http://localhost:11434/api/chat', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify(body),
      signal: AbortSignal.timeout(120000)
    })
    if (!response.ok) {
      return new Response(JSON.stringify({ error: 'Ollama error: ' + response.status }), {
        status: 500, headers: { 'Content-Type': 'application/json' }
      })
    }
    return new Response(response.body, {
      headers: { 'Content-Type': 'application/x-ndjson' }
    })
  } catch (e) {
    return new Response(JSON.stringify({ error: e.message }), {
      status: 503, headers: { 'Content-Type': 'application/json' }
    })
  }
}

On the client side, check the response status before starting to read the stream and display a user-friendly error message if something goes wrong:

const res = await fetch('/api/chat', { ... })
if (!res.ok) {
  const err = await res.json()
  messages = [...messages.slice(0, -1), { role: 'assistant', content: 'Error: ' + err.error }]
  loading = false
  return
}

You should also handle the case where the user submits a new message while a response is still streaming. The simplest approach is to disable the send button while loading is true, which the template already does. A more sophisticated approach would use an AbortController to cancel the in-flight request when the user clicks a stop button, then reset the loading state. This gives users control over long-running generations and prevents the UI from appearing frozen when a model is slow to respond.

Persisting Conversation History

By default, the chat history lives only in Svelte’s component state and is lost on page refresh. For many local tools this is acceptable, but if you want persistence you have several options. The simplest is localStorage — store the messages array as JSON and reload it on mount:

<script>
  import { onMount } from 'svelte'
  let messages = []

  onMount(() => {
    const saved = localStorage.getItem('chat-history')
    if (saved) messages = JSON.parse(saved)
  })

  function updateHistory() {
    localStorage.setItem('chat-history', JSON.stringify(messages))
  }
</script>

Call updateHistory() after each message is added or updated. For a more robust solution, SvelteKit’s server routes can write to a SQLite database using the better-sqlite3 package, giving you a persistent, queryable conversation store that survives restarts and can support multiple named conversations.

Deploying SvelteKit with Ollama

For local use, npm run dev is all you need. When you are ready to run the app more permanently, build it with npm run build and start it with node build using the Node adapter. Add the Node adapter to your project first:

npm install -D @sveltejs/adapter-node

Then update svelte.config.js:

import adapter from '@sveltejs/adapter-node'
export default { kit: { adapter: adapter() } }

Run npm run build and then node build to start the production server. The SvelteKit app will listen on port 3000 by default, and all Ollama requests are proxied server-side, so users only need access to your SvelteKit server — not to the Ollama instance directly. This makes it straightforward to host the UI on one machine and run Ollama on another machine on the same local network by simply changing the Ollama URL in your server route.

Conclusion

SvelteKit’s clean architecture and Svelte’s efficient reactivity make it an excellent choice for building local AI interfaces with Ollama. The server route pattern keeps your Ollama instance private, the load function makes server-side data fetching simple, and Svelte’s fine-grained reactivity handles streaming updates with minimal code. From this foundation you can add features like conversation persistence with a SQLite database, file upload for document question-answering, or a multi-agent pipeline — all running entirely on your own hardware with no external API dependencies.

Either approach integrates naturally with the server route architecture established earlier in this guide, and gives you a solid, maintainable foundation to build on as your application grows in scope and complexity.