How to Use Ollama with Svelte

Svelte is a modern JavaScript framework that compiles components to highly efficient vanilla JavaScript — no virtual DOM, no runtime overhead, and significantly smaller bundle sizes than React or Vue. If you are building a frontend that connects to a local Ollama backend, Svelte gives you a fast, lightweight interface that streams AI responses smoothly and feels snappy even on modest hardware. This guide covers building an Ollama chat interface in Svelte, including streaming token display, conversation history, a model selector, and how to connect to an Ollama backend through a simple API proxy.

Svelte’s reactive model — where state changes automatically propagate to the DOM — is a natural fit for streaming LLM responses. Each token appended to a string variable instantly updates the rendered text, giving you the typewriter effect with almost no explicit DOM manipulation code.

Setup

npm create svelte@latest ollama-chat
cd ollama-chat
npm install
npm run dev

Choose the minimal skeleton template and enable TypeScript if preferred. The dev server starts at localhost:5173 with hot module replacement.

Calling Ollama Directly from Svelte

For local development, you can call Ollama directly from the browser since both run on the same machine. Enable CORS on Ollama by setting OLLAMA_ORIGINS=http://localhost:5173 before starting Ollama. In production, always proxy through a backend to avoid exposing Ollama to the network:

<script>
  let messages = [];
  let input = "";
  let streaming = false;

  const OLLAMA_URL = "http://localhost:11434";
  const MODEL = "llama3.2";

  async function sendMessage() {
    if (!input.trim() || streaming) return;
    const userMsg = { role: "user", content: input };
    messages = [...messages, userMsg, { role: "assistant", content: "" }];
    const idx = messages.length - 1;
    input = "";
    streaming = true;

    const resp = await fetch(`${OLLAMA_URL}/api/chat`, {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({
        model: MODEL,
        messages: messages.slice(0, -1),
        stream: true
      })
    });

    const reader = resp.body.getReader();
    const decoder = new TextDecoder();

    while (true) {
      const { done, value } = await reader.read();
      if (done) break;
      const lines = decoder.decode(value).split("\n").filter(Boolean);
      for (const line of lines) {
        const chunk = JSON.parse(line);
        const token = chunk?.message?.content ?? "";
        messages[idx] = { ...messages[idx], content: messages[idx].content + token };
        messages = [...messages];
        if (chunk.done) break;
      }
    }
    streaming = false;
  }
</script>

The key Svelte reactivity pattern is the messages = [...messages] assignment inside the streaming loop. Svelte’s reactivity is triggered by assignment, not by mutation, so updating an array element with messages[idx].content += token alone will not trigger a re-render. Reassigning the array reference with the spread operator forces Svelte to re-evaluate and update the DOM on every token, producing the live streaming effect.

The Chat UI Template

Add the template section to display messages and the input bar:

<div class="chat">
  <div class="messages">
    {#each messages as msg}
      <div class="message {msg.role}">
        <strong>{msg.role === "user" ? "You" : "AI"}</strong>
        <p>{msg.content}</p>
      </div>
    {/each}
  </div>
  <div class="input-bar">
    <input
      bind:value={input}
      on:keydown={(e) => e.key === "Enter" && sendMessage()}
      placeholder="Ask something..."
      disabled={streaming}
    />
    <button on:click={sendMessage} disabled={streaming}>
      {streaming ? "..." : "Send"}
    </button>
  </div>
</div>

Svelte’s {#each} block re-renders the message list whenever messages changes. The input and button are disabled while streaming to prevent overlapping requests. The bind:value directive keeps the input in sync with the input variable bidirectionally.

Adding a Model Selector and System Prompt

Fetch available models from Ollama on mount and let users select one:

<script>
  import { onMount } from "svelte";
  let models = [];
  let selectedModel = "llama3.2";
  let systemPrompt = "You are a helpful assistant.";

  onMount(async () => {
    try {
      const resp = await fetch(`${OLLAMA_URL}/api/tags`);
      const data = await resp.json();
      models = data.models.map(m => m.name);
      if (models.length) selectedModel = models[0];
    } catch {
      models = [selectedModel];
    }
  });
</script>

<!-- In the template: -->
<select bind:value={selectedModel}>
  {#each models as model}
    <option value={model}>{model}</option>
  {/each}
</select>
<textarea bind:value={systemPrompt} rows="2" />

In the sendMessage function, build the messages array with the system prompt prepended and use selectedModel instead of the hardcoded MODEL constant. The bind:value on the select and textarea keep the component state in sync automatically without any event handlers.

Proxying Through a Backend

For any deployment beyond localhost, proxy Ollama calls through a backend rather than calling Ollama directly from the browser. SvelteKit (covered in a separate guide) includes a server-side API route system that is perfect for this. With plain Svelte and Vite, add a dev proxy in vite.config.js:

export default {
  server: {
    proxy: {
      "/api/ollama": {
        target: "http://localhost:11434",
        rewrite: path => path.replace(/^\/api\/ollama/, "")
      }
    }
  }
}

Then change your fetch URL to /api/ollama/api/chat. In production, configure nginx to proxy the same path to Ollama and add authentication at the nginx level. This keeps Ollama off the public internet and gives you a single point to add rate limiting, logging, and access control.

Markdown Rendering

LLM responses often include Markdown formatting — bold text, bullet points, code blocks. Install a Markdown renderer to display them properly:

npm install marked highlight.js

<!-- In your component: -->
<script>
  import { marked } from "marked";
  import hljs from "highlight.js";
  marked.setOptions({ highlight: (code, lang) => hljs.highlight(code, { language: lang || "plaintext" }).value });
  $: renderedContent = marked.parse(msg.content);
</script>

<div class="message-content">{@html renderedContent}</div>

The $: prefix makes renderedContent a reactive declaration — it recalculates whenever msg.content changes, which happens on every streaming token. Svelte’s {@html} directive renders raw HTML, which is necessary for Markdown output. Sanitise the HTML with DOMPurify if users can influence the Markdown content, to prevent XSS from malicious model outputs.

Svelte Stores for Shared State

If your Svelte app has multiple components that need access to the conversation history or model settings, use Svelte stores instead of passing props down through the component tree. Create a stores.js file with writable stores for messages, the current model, and the system prompt. Import and use these stores in any component with the $store syntax — Svelte automatically subscribes and unsubscribes, and any change to a store value propagates to all components that reference it. This is particularly useful for a sidebar settings component and a main chat area that need to share the model selection without a parent component mediating between them.

Svelte’s compiled output for a chat application like this is typically under 20KB of JavaScript — significantly smaller than equivalent React or Vue apps. For an Ollama frontend that runs locally on a developer machine this does not matter much, but for an application you serve to many users or run on a Raspberry Pi with limited resources, the performance difference between Svelte’s compiled output and a heavier framework’s runtime bundle is meaningful.

Handling Errors and Loading States

Production chat interfaces need to handle network failures gracefully. Wrap the fetch call in a try-catch and update the last message with an error indicator if something goes wrong. Svelte’s reactive declarations make this clean — add an error property to the assistant message object and conditionally render an error style in the template using {#if msg.error}. Reset the streaming flag in a finally block so the UI does not get stuck in a loading state if the request fails partway through.

Add a loading indicator while waiting for the first token — there is typically a second or two delay before Ollama starts streaming, especially on the first request when the model is loading. Use a simple reactive variable waitingForFirst that is set to true when the request starts and false when the first token arrives. Conditionally render a pulsing dot or a spinner in the assistant message bubble while waitingForFirst is true, replacing it with the streaming text once tokens start arriving. This small UX detail makes the interface feel significantly more responsive — users see immediate feedback that their message was received rather than wondering if the request was sent at all.

Scroll Behaviour

Chat interfaces should scroll to the bottom automatically as new tokens arrive, but not if the user has scrolled up to read earlier messages. Implement smart auto-scroll: track whether the user is near the bottom of the messages container, and only auto-scroll if they are. Svelte’s afterUpdate lifecycle function runs after every DOM update, making it the right place to check the scroll position and scroll down if appropriate.

Use a template binding to get a reference to the messages container element: <div bind:this={messagesEl}>. In the afterUpdate callback, check if messagesEl.scrollHeight - messagesEl.scrollTop - messagesEl.clientHeight < 100 — if the user is within 100 pixels of the bottom, scroll to the bottom. If they have scrolled up further, leave their position alone. Add a “scroll to bottom” button that appears when the user is scrolled up, giving them a way to jump back to the latest message without scrolling manually.

Local Storage Persistence

Preserve conversation history across browser refreshes using localStorage. On component mount, load any saved messages from localStorage. After each message exchange, save the updated messages array. Add a clear history button that resets both the in-memory messages array and the localStorage entry. This gives the chat interface the same persistence behaviour users expect from ChatGPT and similar tools — the conversation is still there when you reopen the tab, without any server-side storage or user accounts required.

Be mindful of localStorage’s 5MB limit for very long conversations. Add a character count check before saving and warn users or truncate old messages if the conversation approaches the limit. For most typical use cases — a few dozen messages per session — this limit is never reached, but for power users who run very long conversations the truncation logic prevents a silent failure where new messages fail to save without any visible error.

Svelte vs React for Ollama Frontends

React is the dominant JavaScript framework for web development, but Svelte has meaningful advantages for local AI frontend applications. Svelte’s compiled output is significantly smaller — a Svelte chat app compiles to 15-20KB of JavaScript, while an equivalent React app with its runtime weighs 40-50KB before any application code. For applications served from a local server like Ollama’s built-in web server or a simple Node.js/Python backend, smaller bundles mean faster initial load times, which matters when the app is running on modest hardware like a Raspberry Pi or an older laptop.

Svelte’s reactivity model is also more intuitive for streaming use cases. In React, updating a streaming message requires careful use of useRef, useState, and potentially useReducer to avoid stale closures in async callbacks — a common source of bugs in streaming chat UIs. In Svelte, the reactive assignment pattern shown in this guide is straightforward and predictable: assign to a variable, the DOM updates. For developers who are not React specialists, Svelte’s simpler mental model reduces the friction between “I want to show streaming tokens” and “I have code that does it correctly.”

The main reason to choose React over Svelte for an Ollama frontend is existing team expertise and the broader React ecosystem — more component libraries, more tutorials, more third-party integrations. If your team already uses React, stay with React. If you are starting fresh and want a simpler, faster framework for a local AI interface, Svelte is an excellent choice that will serve most Ollama frontend use cases well.

Deploying a Svelte Ollama App

Build the Svelte app for production with npm run build, which outputs a dist/ directory of static HTML, CSS, and JavaScript files. Serve these files with any static file server — nginx, Caddy, or even Python’s http.server module for local use. Because the compiled output is pure static files with no server-side runtime, deployment is as simple as copying the dist/ directory to any web server and pointing it at the right directory. Configure the server to proxy /api/ requests to Ollama and add authentication so only authorised users can access the chat interface.

For a fully local deployment where the Svelte app and Ollama run on the same machine, serve the built files from a simple Node.js server that also proxies Ollama requests. The proxy handles the CORS issue — the browser treats requests to the same origin as same-site, so no special CORS configuration is needed on Ollama. A 20-line Express.js server that serves the static files and proxies /api/chat to Ollama is all you need for a production-quality local deployment that starts in under a second and runs on any machine with Node.js installed.

Svelte’s combination of compile-time optimisation, intuitive reactivity, and minimal boilerplate makes it one of the best choices for building local AI frontends in 2026. The patterns in this guide — streaming tokens, model selection, conversation history, Markdown rendering, and error handling — cover everything you need for a professional chat interface. Build the basic version first, verify the streaming works correctly with your Ollama setup, and layer in the additional features as your application’s requirements develop.

The SvelteKit guide on this site covers the full-stack version of this setup, with server-side API routes that proxy Ollama requests securely and server-side rendering for faster initial page loads.